Education‎ > ‎MSc theses‎ > ‎

Open topics

MS2017-02: Big Data Psychiatry Architecture

posted May 23, 2017, 6:50 AM by Marco Spruit   [ updated Aug 30, 2017, 2:42 AM ]

At the Psychiatry department of the UMCU, we are working on bringing the results from data analysis to the daily working practice. To be able to utilize all data from various sources within the UMCU, an ETL process is being created that ensures automatic and regular updates of all data, preprocessed into a format that is suitable for analysis. Currently, a set of pipelines exists that perform operations on the data. This process is implemented using various tools and programming languages, such as Gitlab, Jenkins, Sas, Pyton and R.

We have several open topics and questions about this, that can be suitable for an MBI thesis or a capita selecta project. For example:
  1. What is a good way to govern the system; i.e. who has acces to what data? Where in the process and by whom should this be arranged?
  2. How can we design, measure and improve the usability of the system?
  3. How can the architecture of the system be described and evaluated?
  4. How can we automatically create interesting dashboards, reports or other visualisations that moreover remain interesting after the first few iterations
Some topics might be suitable for a capita selecta project, especially the system architechture, ideally resulting in a paper. For an MBI thesis, the topics can be combined or further extended.

Please contact Vincent Menger or MS for more info.

MS2017-01: Understanding unstructured clinical data in hospitals before utilizing it

posted Jan 6, 2017, 9:15 AM by Marco Spruit

A great amount of research papers stated that there are clinical data captured in narrative texts (e.g. clinical notes, discharge letters), and proposed techniques/methods to transform such unstructured information into structured data that could be used for the secondary use.. However, little research has been conducted to answer questions about the data itself in the first place:
  • What are the unstructured clinical data captured in hospitals?
  • Why such data is captured in free-text?
  • Are there any difference of unstructured clinical data among hospitals?
  • How to quantitatively measure how unstructured these data is or how difficulty it is to process them?
  • etc.
In many cases, researchers or practitioners just skip such questions and move directly to develop tools to transform unstructured data into structured information. We believe a better understanding of such data will provide useful insights for developing data transformation tools or improve the efficiency and effectiveness of such existing tools in practice.

Overall supervisor: Marco; Daily supervisor: Ian Shen.

MS2016-12: Data Quality Improvement in Data Space Environments

posted Oct 26, 2016, 8:17 AM by Marco Spruit   [ updated Oct 27, 2016, 12:15 AM ]

The Research and Documentation Centre (WODC) of the Dutch Ministry of Security and Justice uses a lot of different, heterogeneous, data sets in their research. The need for integration depends strongly on the research being done, which is why the data sets are managed using a data space approach. In this approach, data integration and other data quality improvement are initiated by the need for it in specific research projects (pay-as-you-go). However, decentralizing data quality improvement is not always the most efficient way; when several projects encounter the same data quality issues collaboration on the improvement of these issues is desirable and the issues are also likely to become more urgent to solve. The WODC wants more insight in the impact of known data quality issues, and looks for solutions to determine which issues are better to solve in a more centralized way.

NB: Due to the Dutch data and documentation, understanding of written Dutch is preferable.

MS2016-11: Knowledge Provenance System for Data Space Architectures

posted Oct 26, 2016, 8:16 AM by Marco Spruit   [ updated Oct 27, 2016, 12:15 AM ]

The Research and Documentation Centre (WODC) of the Dutch Ministry of Security and Justice uses a lot of different, heterogeneous, data sets in their research. The need for integration depends strongly on the research being done, which is why the data sets are managed using a data space approach. In this approach, mappings are used to transform data sets such that they are useful for specific research projects and therefore volatile and highly flexible. Insight in these mappings and especially why they are performed is essential to understand the research data set. So, besides data provenance of the data itself, the WODC is also interested in storing the associated knowledge in a transparant yet maintainable way.

NB: Due to the Dutch data and documentation, understanding of written Dutch is preferable.

1-4 of 4