Please refer to my Research Projects page for active projects.

This Self-Service Data Science agenda reflects my research plans regarding Applied Data Science and Natural Language Processing in Healthcare, for which I have formulated the following objectives, themes and expected outcomes:


  1. Construct an authoritative language model for Dutch healthcare.
  2. Design and deploy an open, online, self-service, patient-sensitive platform for healthcare.
  3. Improve Dutch healthcare processes trustworthily using data science technologies.
  4. Identify governance controls for technology acceptance by patients and professionals in practice.

Transfer Learning for Natural Language Understanding

Ever since the 1980s researchers have sought to understand the cognitive architecture of human language capability by connecting the subsymbolic (neural, probabilistic) and symbolic (logical, grammatical) approaches for Natural Language Processing (NLP). Therefore, we will investigate Transfer Learning techniques for Dutch healthcare texts by employing pre-trained contextual language model experiments (e.g. BERT/ClinicalBERT, HuggingFace), with a focus on federated healthcare language modelling (e.g. PySyft). Next to that, we will develop rule-based models for Dutch healthcare texts, using Dutch Dependency Grammar and regular expression-based models (e.g. Alpino, DEDUCE). Then, we will be performing a series of experiments to optimise our language model architecture with linguistic features, structured EHR data and external data sets (e.g. ontology matching, CBS demographics). This results in an authoritative language model for deep understanding of Dutch healthcare communications.

Multimodal Machine Learning for Self-Service Data Science

Apart from the wealth of recorded textual data, many more potential sources of structured data are available. Therefore, we will extend text analysis methods with these heterogeneous datasets using Multimodal Machine Learning, e.g. to perform risk profiling with structured EHR data (e.g. survey data) and wellbeing detection with personalised sensor data (e.g. wearables). Additionally, we focus on Automated Machine Learning (AutoML) techniques for Self-Service Data Science to enable (semi-)automated data exploration guided by professionals. We investigate Interactive Visualisation and Verbalisation techniques for promoting Explainable Artificial Intelligence (XAI) by employing model-agnostic and model-specific explanations of machine learning predictions (e.g. LIME). The findings above are subsequently implemented in a Clinical Decision Support System (CDSS) for daily healthcare practices.

Federated Learning for Distributed Deployment

Interorganisational interoperability is essential for our proposed trustworthy patient-sensitive platform. We therefore extend and embed the local institutions' analytics architectures with an interorganisational Federated Learning infrastructure (e.g. PySyft). The “Compute Visits Data” knowledge discovery process needs both organisational and technical guidelines, which we evaluate using Action Design Research. To ensure continuous trust as well as security and privacy validation (e.g. ISFAM), we will systematically perform Penetration Tests (e.g. Kali) and organise several white-hat hackathons to re-identify data and test organisational security. All findings will be included in a standardised user manual and in open source repositories (e.g. Github). Then, we identify computational and organisational governance-related controls regarding algorithm fairness (e.g. Fairlearn, AI Fairness 360), and how to best communicate these controls to patients (e.g. self-management dashboard).

Expected outcomes

1.    Language model for deep understanding of Dutch healthcare communications
2.    Multimodal machine learning algorithms for Dutch healthcare
3.    Knowledge discovery process model for data science in healthcare
4.    Self-service data science clinical decision support system for Dutch healthcare
5.    Governance controls for multi-stakeholder technology acceptance
6.    COVIDA distributed patient-sensitive knowledge deployment platform for continuous learning