Education‎ > ‎MSc theses‎ > ‎

Open topics

Below are some MBI thesis project ideas which I would find interesting to supervise. 

MS2018-03: Automatic Text Summarization from Grant Applications

posted Mar 23, 2018, 2:06 AM by Marco Spruit

  • Full title: Automatic Text Summarization of Research Data Management Strategies from Grant Applications With Natural Language Processing (NLP) and Deep Learning (DL) approaches 
  • Focus: Data science applied to research grants 
  • Type of research: simulation, experiment, design science, text mining, information retrieval
  • Collaboration: This project conducted in collaboration with IDFUSE ( ), it is not an internship but a full-time research project at the UU. The collaboration with IDFUSE fosters exchange of data (i.e. real research grant applications from researchers) and knowledge (effective NPL/DL approaches). IDFUSE developed an IT application to help researchers improve the impact of their proposal by evaluating specific aspects of the knowledge utilization paragraphs (see In this project, similar analyzes will be extended to data management paragraphs
See the attached text for more info.

--Armel and Marco

MS2018-02: NLP for medical psychiatric records

posted Feb 20, 2018, 3:17 AM by Marco Spruit   [ updated Feb 20, 2018, 3:19 AM ]

"Utilizing free text from the medical psychiatric record" 
-> Since the medical text is written in Dutch, this project is only suitable for someone who is a native Dutch speaker.

At the psychiatry department of the UMCU, our data science team works on bringing the results of data analysis to the daily practice of the psychiatry work floor. Over the past two years, we have been working on creating an environment that allows working with patient data, and an infrastructure that makes the diverse types of patient data that are gathered available for analysis – partially with the help of MBI students. Currently, we are looking for a motivated MBI student with an interest in data analysis for the following problem:

Much of the data that are gathered in Electronic Health Records is typed in free text format by nurses and psychiatrists, for example in doctor/nurse notes, treatment plans, incident reports, and BHOM measurements. Compared to using structured data, utilizing text data has some additional challenges, we however believe that it contains information that is currently largely unused and waiting to be explored. Although the exact topic of research is still very open, we have some open questions:
  • Can we accurately capture a patient's current wellbeing or detect events based on what is written in medical text?
  • How can we represent text to perform information retrieval (e.g. extract information that is not captured in structured data to use as input for research)?
  • How can we represent text to predict or classify several outcomes (e.g. length of stay, aggression, diagnosis, symptoms)?
  • How can we enable researchers that do not have a technical background to utilize text data for their research?
  • Other questions or combinations of the above based on your interests.
For the project, you will first have to become acquainted with the psychiatry domain and the text data that is gathered, and with current NLP techniques in the medical field (in Dutch/English). We will then find a research problem based on the questions described above, and then solve this problem with a novel NLP technique. Solving the problem will require hands on working with the data - most NLP problems are currently tackled in Python, but it is also possible that other methods or programming environments are more suitable. We can offer you a chance to work with actual patient data in a challenging environment - the actual work with the patient data has to be conducted within the UMCU (at the Uithof). For the rest of the project, you are free to work where and when it suits you.

Please contact Vincent Menger or Marco for more information.

MS2018-01: Text Analytics for interview analysis automation

posted Jan 4, 2018, 9:24 AM by Marco Spruit   [ updated Jan 11, 2018, 6:29 AM ]

This project idea is a collaboration with the Utrecht University School of Economics (U.S.E.) department who will provide a recent reference database of expert interview audio files and/or transcripts in English related to entrepreneurial clusters, which need to be transcribed, analysed, and interpreted. 

We want you to explore the extent to which this can be automated or even enriched using Natural Language Processing (NLP) techniques. Put differently, we want you to research the extent to which a typical NVivo interview transcription and content analysis process can be automated and enriched.

We expect this Applied Data Science project to measure and improve the overall data quality of colloquial speech processing tasks, strategies for textual content enrichment, dialogue segementation, and intelligent research method support, among others.

You need to be able to prototype your NLP solution in order to evaluate it interactively and iteratively in the USE case study. Additionally, many other interview data are available from previous researches.

MS2017-04: Anomaly detection for industrial sensor data

posted Nov 27, 2017, 3:36 AM by Marco Spruit

With the advance of Internet of Things (IoTs), nowadays mechanical equipment, ranging from elevators, vehicles, to aircrafts and wind turbines, are typically instrumented with numerous sensors to constantly capture the behaviors and health of the machine. Those sensors have been used to create systems that monitor devices in real-time. Besides real-time monitoring systems, both researchers and practitioners are working on utilizing data collected by these sensors to profile the failures of devices. In some cases, even to build models to predict device failures. Due to the lack of labelled datasets, building predictive models with supervised learning is very difficult and time-consuming. Unsupervised anomaly detection becomes a better option in handling such data (Malhotra et al., 2015, Malhotra et al., 2016, Park et al., 2017).

The aim of this project is to explore the use of various anomaly detection techniques, including one-class SVM, PCA, LSTMs and etc., in industrial sensor data collected by Shell. The results of your anomaly detection will provide useful insights for maintenance and help create more efficient maintenance plans.

You will be working with a dedicated data science team in Shell (globally), and have the opportunities to solve real business problem with your advanced data techniques through an internship. To get an internship there, you need to pass their online recruitment test and a short phone interview. 

Contact Ian for more details.


  • Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015). Long short term memory networks for anomaly detection in time series. In Proceedings (p. 89). Presses universitaires de Louvain. Chicago 
  • Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). Lstm-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148. 
  • Park, D., Hoshi, Y., & Kemp, C. C. (2017). A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder. arXiv preprint arXiv:1711.00614. 
  • Luo, C., Yang, D., Huang, J., & Deng, Y. D. (2017). LSTM-Based Temperature Prediction for Hot-Axles of Locomotives. In ITM Web of Conferences (Vol. 12, p. 01013). EDP Sciences.

1-4 of 4