Education‎ > ‎MSc theses‎ > ‎

Open topics

MS2017-02: Big Data Psychiatry

posted May 23, 2017, 6:50 AM by Marco Spruit   [ updated May 23, 2017, 6:50 AM ]

At the Psychiatry department of the UMCU, we are working on bringing the results from data analysis to the daily working practice. To be able to utilize all data from various sources within the UMCU, an ETL process is being created that ensures automatic and regular updates of all data, preprocessed into a format that is suitable for analysis. Currently, a set of pipelines exists that perform operations on the data. This process is implemented using various tools and programming languages, such as Gitlab, Jenkins, Sas, Pyton and R.

We have several open topics and questions about this, that can be suitable for an MBI thesis or a capita selecta project. For example:
  1. What is a good way to govern the system; i.e. who has acces to what data? Where in the process and by whom should this be arranged?
  2. How can we design, measure and improve the usability of the system?
  3. How can the architecture of the system be described and evaluated?
  4. How can we automatically create interesting dashboards, reports or other visualisations that moreover remain interesting after the first few iterations
Some topics might be suitable for a capita selecta project, especially the system architechture, ideally resulting in a paper. For an MBI thesis, the topics can be combined or further extended.

Please contact Vincent Menger or MS for more info.

MS2017-01: Understanding unstructured clinical data in hospitals before utilizing it

posted Jan 6, 2017, 9:15 AM by Marco Spruit

A great amount of research papers stated that there are clinical data captured in narrative texts (e.g. clinical notes, discharge letters), and proposed techniques/methods to transform such unstructured information into structured data that could be used for the secondary use.. However, little research has been conducted to answer questions about the data itself in the first place:
  • What are the unstructured clinical data captured in hospitals?
  • Why such data is captured in free-text?
  • Are there any difference of unstructured clinical data among hospitals?
  • How to quantitatively measure how unstructured these data is or how difficulty it is to process them?
  • etc.
In many cases, researchers or practitioners just skip such questions and move directly to develop tools to transform unstructured data into structured information. We believe a better understanding of such data will provide useful insights for developing data transformation tools or improve the efficiency and effectiveness of such existing tools in practice.

Overall supervisor: Marco; Daily supervisor: Ian Shen.

MS2016-12: Data Quality Improvement in Data Space Environments

posted Oct 26, 2016, 8:17 AM by Marco Spruit   [ updated Oct 27, 2016, 12:15 AM ]

The Research and Documentation Centre (WODC) of the Dutch Ministry of Security and Justice uses a lot of different, heterogeneous, data sets in their research. The need for integration depends strongly on the research being done, which is why the data sets are managed using a data space approach. In this approach, data integration and other data quality improvement are initiated by the need for it in specific research projects (pay-as-you-go). However, decentralizing data quality improvement is not always the most efficient way; when several projects encounter the same data quality issues collaboration on the improvement of these issues is desirable and the issues are also likely to become more urgent to solve. The WODC wants more insight in the impact of known data quality issues, and looks for solutions to determine which issues are better to solve in a more centralized way.

NB: Due to the Dutch data and documentation, understanding of written Dutch is preferable.

MS2016-11: Knowledge Provenance System for Data Space Architectures

posted Oct 26, 2016, 8:16 AM by Marco Spruit   [ updated Oct 27, 2016, 12:15 AM ]

The Research and Documentation Centre (WODC) of the Dutch Ministry of Security and Justice uses a lot of different, heterogeneous, data sets in their research. The need for integration depends strongly on the research being done, which is why the data sets are managed using a data space approach. In this approach, mappings are used to transform data sets such that they are useful for specific research projects and therefore volatile and highly flexible. Insight in these mappings and especially why they are performed is essential to understand the research data set. So, besides data provenance of the data itself, the WODC is also interested in storing the associated knowledge in a transparant yet maintainable way.

NB: Due to the Dutch data and documentation, understanding of written Dutch is preferable.

MS2016-10: Anticipating floods in Jakarta

posted Oct 7, 2016, 7:32 AM by Marco Spruit   [ updated Oct 7, 2016, 7:33 AM ]

During the rain season in Jakarta many parts of Jakarta will flood. Part of this is caused by construction that extends the town ever further inland and preventing natural water flows. Part of it is also caused by blockage of riverbeds by garbage and slums. Due to the latter it is impossible to create accurate models of the whole delta and predict floods accurately. The riverbeds and sewage canals change too often (and illegal) to be reliably modeled.

In the Peta Jakarta project ( a consortium (led by the university of Wollongong) tried to use twitter reports to get real-time reports about the flooding situation in order to accurately anticipate and react to the floods. Although there were some partial successes it seemed that people were not willing to send official flooding reports by twitter. However, they did twitter a lot to friends and relatives. In these tweets no direct reports were given about the flooding, but the tweets indirectly are giving a lot of information that might be used.

In this project the student will look at the collected twitter data and see which linguistic techniques can be used to extract as much information as possible about the flooding situations.

More information about the original project can be found at:
or from Dr. Frank Dignum. E-mail:

starting time: any time from October 2016

MS2016-08: Maturity Modeling for Research Data Management

posted Jun 24, 2016, 6:32 AM by Marco Spruit

Research Data Management is a very recent field of study that tackles issues such as proper storage and preservation of research data, verification and reuse of datasets from published studies and assist scientists and research labs in managing research assets properly.

As part of the "Maturity Models for RDM" topic, you will conduct a literature review to identify organizational, computational and domain specific factors impacting how research data is managed by scientists. Next, Based on your literature review, you will present a design proposition of a maturity model organizing these factors in maturity levels with appropriate capabilities and metrics. Finally, you will validate the maturity model by conducting interviews with researchers and research IT (meta)data managers. Preferably in different institutions and research fields.

Please contact Armel Lefebvre ( for more information about this topic.

MS2015-07: AI in mHealth

posted Nov 24, 2015, 8:29 AM by Marco Spruit   [ updated Jan 27, 2016, 7:47 AM ]

Bij een Nederlands start-up bedrijf is per direct plek voor een masterstudent die wil afstuderen op een toepassing van artificiële intelligentie (A.I.) in de gezondheidssector.


Het bedrijf wil gedragsverandering stimuleren met behulp van o.a. professionele coaching en heeft daarvoor een ICT platform voor smartphone apps en devices ontwikkeld. Het platform is getest in pilots en werkt inmiddels uitstekend. De pilots laten zien dat de aanpak tot nu toe het gevoel van persoonlijke benadering versterkt. Er zijn echter ook beperkingen. Zo wordt het aanbod van mogelijke activiteiten voor gedragsverandering steeds groter, wat het moeilijk maakt voor de gebruiker om daar uit te kiezen. Bovendien is die keuze niet altijd even effectief. Het bedrijf is daarom nu voorzichtig begonnen met een AI-traject om de matching tussen gebruiker en aanbod te optimaliseren.


Het bedrijf wil graag weten hoe een optimale matching ingebouwd kan worden in het platform. Een of andere vorm van A.I. zal hier zeer waarschijnlijk aan bij kunnen dragen. Maar welke vorm dan? We vragen de student om een goed onderbouwd advies en indien mogelijk het begin van de ontwikkeling. Het is belangrijk dat het onderzoek wordt gesimuleerd in een testomgeving, zodat zekerheid wordt verkregen over de werking daarvan. Je wetenschappelijke bijdrage zal mogelijk de vorm hebben van een meta-algoritmisch model ('recept') voor dit probleemdomein van gedragsverandering via mHealth, wat je evalueert middels een prototype implementatie met gebruikmaking van echte data.


Je komt te werken bij een kleine organisatie die een tot twee jaar voor ligt op Nederlandse concurrenten op dit gebied. Op dit moment staat het bedrijf op het punt van doorbreken. AI helpt ons om nog beter te worden. Jouw bijdrage is dus belangrijk. Het team van THA is zeer multidisciplinair, slim, origineel, enthousiast en pragmatisch ingesteld. We waarderen creativiteit. Je wordt binnen THA begeleid door de directeur wetenschap en de front end ontwerper. Als alles goed gaat is de kans aanwezig dat je door kunt groeien binnen het bedrijf. 

Tevens zal een redelijke stagevergoeding in onderling overleg worden overeengekomen.


We starten liefst per omgaande en in ieder geval nog in 2015. Zorg er voor dat je elke donderdag aanwezig kunt zijn op ons kantoor in Utrecht. Verdere planning in overleg.


Zorg dat je permissie en begeleiding hebt vanuit je opleiding. Je zult verder een geheimhoudingsverklaring moeten ondertekenen.


Vraag bij Marco om meer informatie.

1-7 of 7