Here's a blog which highlights some of the more memorable events during my daily routine.... Events include accepted or rejected papers (ACCEPT/REJECT), master thesis defenses by students I supervised (MBI), research presentations of papers I (co-)authored (TALK), grant awards and rejections (ACCEPT/REJECT), and important research interest statements, among others.

TALKS: Artificial Intelligence for Medication Reviews

posted Nov 4, 2021, 5:54 AM by Marco Spruit

Today I gave an invited talk on "AI en farmacie in balans?" titled The STRIP Assistant Decade - Artificial Intelligence for Medication Reviews, at the Nederlandse Vereniging van ZiekenhuisApothekers (NVZA) Jaarcongres 2021 in De Fabrique at Maarsen. Finally a crowd (of around 50 people) again to talk to👍 It is an everlasting story about the long and bumpy road of a great AI idea towards actual innovation in daily care. Some luck and lots of perseverance required...


posted Jul 15, 2021, 8:33 AM by Marco Spruit   [ updated Jul 15, 2021, 8:44 AM ]

As a trivial challenge, you could try to locate my name on this excellent milestone publication:
  • Blum, Sallevelt, Spinewine, O'Mahony, Moutzouri, Feller, Baumgartner, Roumet, Jungo, Schwab, Bretagne, Beglinger, Aubert, Wilting, Thevelin, Murphy, Huibers, Drenth-van Maanen, Boland, Crowley, Eichenberger, Meulendijk, Jennings, Adam, Roos, Gleeson, Shen, Marien, Meinders, Baretella, Netzer, Montmollin, Fournier, Mouzon, O'Mahony, Aujesky, Mavridis, Byrne, Jansen, Schwenkglenks, Spruit, Dalleur, Knol, Trelle, Rodondi (2021). Optimizing Therapy to Prevent Avoidable Hospital Admissions in Multimorbid Older Adults (OPERAM): Cluster Randomised Controlled Trial. BMJ, 374(n1585). []
Never been happier with a 41st co-author position😁 It marks the culmination of a decade-long hard work on the STRIP Assistant (STRIPA), our Clinical Decision Support System to facilitate medication reviews for polypharmacy patients. Other STRIPA studies include OPTICA and STRIMP. The British Medical Journal (BMJ) is an absolute top journal with a whopping impact factor of 39.98!

It even comes with a nice explainer video👍 Have a look:

TALKS: SAILS Lunch Time Seminar

posted Jun 22, 2021, 1:41 AM by Marco Spruit   [ updated Jun 22, 2021, 1:56 AM ]

On Monday 21 June 2021, I gave a talk at Leiden University's SAILS Lunch Seminar on Natural Language Processing for Translational Data Science in Mental Healthcare. First, I positioned the research domain of Translational Data Science, in the context of the COVIDA research programme on Dutch NLP for healthcare. Then, I presented our prognostic study on inpatient violence risk assessment by applying natural language processing techniques to clinical notes in patients’ electronic health records (Menger et al, 2019). Finally, I discussed followup work where we try to better understand the performance of the best performing RNN model using LDA as a text representation method among others, which reminded us once more of the lingering issue of data quality in EHRs.

Dr. Lefebvre: Research Data Management for Open Science

posted Mar 15, 2021, 10:24 AM by Marco Spruit   [ updated Mar 15, 2021, 10:25 AM ]

Today Armel Lefebvre defended his dissertation Research Data Management for Open Science. Unfortunately, in completely online COVID19-proof fashion. Nevertheless, Armel passionately, competently and confidently defended his PhD research! Coincidentally, Armel's dissertation is the first Ph.D. thesis in which I am credited in the role of promotor (instead of being listed as co-promotor).

From the back cover: "This dissertation maps out the challenges in the current practices in science regarding reproducibility and data sharing in research. First, we identify the main stakeholders in the context of open science in Dutch academia. Next, we analyze research practices in the aspects of reproducibility and data management in both the actual laboratory context and scientific publications. We discuss particularly the threats that laboratories would face in the future without the assistance of proper research data management strategies. Finally, we focus on the future of scholarly communication and discuss how research object technology and open science readiness can contribute to open and more reproducible scientific practices."

I bet we will hear much more in coming years with the scholarly discourse about the concepts which Armel introduces in this work, especially Laboratory Forensics and Open Science Readiness... Stay tuned!

VACANCY: Assistant Professor Data Science in Population Health (Tenure Track) in Leiden

posted Dec 27, 2020, 12:35 PM by Marco Spruit   [ updated Dec 30, 2020, 1:25 PM ]

What you do

This unique tenure track position offers the best of both worlds: 50% of your work will be performed from the Campus The Hague of the LUMC, and the other 50% from the Leiden Institute of Advanced Computer Science (LIACS) within the faculty of Science of Leiden University. This means that you will be a strategic linking pin in various collaborations at the junction of data science and natural language processing in the broad area of population health. This position is embedded within the recently launched Population Health Living Lab (PHLL) The Hague, which allows you to contribute to a sustainable and robust realization of the most extensive population dataset within the Netherlands, and to consequently perform novel multidisciplinary data analyses. As assistant professor, you are expected to contribute to at least one of our overarching research themes on our Translational Data Science research agenda. Regarding teaching, you are expected to contribute around 50% of your appointment to LUMC’s Population Health Management (PHM) master’s program and LIACS’s curricula, which includes co-developing, co-teaching, and coordinating the data science courses as well as the track itself, as well as thesis supervision.


  • You position yourself as an interorganizational linking pin in the Medical Delta ecosystem at the junction of Data Science initiatives in the broad area of Population Health
  • You contribute to the further development of the Population Health Living Lab (PHLL) ecosystem with respect to research related to data engineering and translational data science
  • You contribute to the Population Health Management master’s program by co-developing, co-teaching and coordinating data science courses as well as the track itself

What we ask

You’re an expert in either the research theme of Data Engineering/Information Science or (Big) Data Analytics/Machine Learning, and knowledgeable in the other one. Similarly, you are an expert in utilizing statistical methods and machine learning techniques on real data. You are conscientious and creative, and you have experience at the postdoctoral level with a strong publication record and a proven track record in teaching. Furthermore, you are experienced in raising research funds. You are passionate about investigating and utilizing data science technologies, focusing on state-of-the-art application-oriented research in Explainable AI, AutoML, Big sensors/wearables data, speech recognition, neuro-linguistic programming, affective computing, etc. You are skilled in Python development, like using SciKit-Learn, HuggingFace, PySyft, and Streamlit. Lastly, you are communicatively skilled and you work well collaboratively.

More information?

Hello Leiden!

posted Dec 1, 2020, 6:45 AM by Marco Spruit   [ updated Jan 14, 2021, 11:52 AM ]

Today is my first day as Professor of Advanced Data Science in Population Health at the Public Health & Primary Care (PHEG) department of the Leiden University Medical Centre (LUMC) and the Leiden Institute of Advanced Computer Science (LIACS) of the Faculty of Science (W&N)! Apart from being a great milestone in itself, here is my TOP-3 of Unique Selling Points why I am particularly excited:
  1. It is a formal DUAL APPOINTMENT, meaning that am appointed at both LUMC as well as LIACS. This makes me the official linking pin for the many upcoming collaborations at the junction of data science and natural language processing in healthcare.
  2. In Leiden, my new colleagues have developed over the last years the LARGEST POPULATION DATASET in the Netherlands, with access to anonimised health records of 500K+ patients, using the Central Bureau of Statistics (CBS) as its Trusted Third Party. Pure gold!
  3. My primary affiliation is within a MULTIDISCIPLINARY setting on the campus The Hague: the Population Health Living Lab (PHLL). This is a so-called QUADRUPLE HELIX fieldlab, where Academia, Industry, Citizens, and Government all collaborate.
I'd like to thank everyone at Utrecht University for the many inspiring informal encounters, personal development programmes and research collaborations that I have had with many of you throughout these... 13 years. I have learned a bunch and it was a lot of fun!

But from now on, it is... Hello Leiden!

PS: I find it truly amazing to discover that my announcement on LinkedIn has been read over 13,000 times already after just one week!

Dr. Tawfik: Text Mining for Precision Medicine

posted Nov 25, 2020, 6:37 AM by Marco Spruit   [ updated Nov 25, 2020, 6:38 AM ]

Yesterday Noha Tawfik defended her dissertation Text Mining for Precision Medicine: Natural Language Processing, Machine Learning and Information Extraction for Knowledge Discovery in the Health Domain. In extreme COVID19 style, we were with merely 8 people --including audience-- in the Senate Hall of the UU Academiegebouw. Nevertheless, Noha admirably competently and passionately defended her PhD research!

In Noha's first research phase, she mainly employed Information Extraction to automate the identification and analysis of Genome-Wide Association Studies, given a particular disease, to investigate the relation between different phenotypic traits and Single Nucleotide Polymorphisms, known to be associated with that disease. In the second research phase, Noha expands upon the previous work by employing Machine Learning algorithms to the problem of detecting contradictions between two statements, extracted from abstracts of published articles. interpreting contradictory findings as a likely Precision Medicine finding. In the third and final phase of her research, Noha her contradiction detection research in conformance with Natural Language Inference (NLI) best practices, and participated in the 2019 ACL "Medical Natural Language Inference" challenge where she battled successfully against entire teams of various top universities.

All in all a truly excellent achievement in 4 years time with no less than 7 peer-reviewed publications!

Dr. Omta: a hybrid PhD defense

posted Oct 16, 2020, 1:07 AM by Marco Spruit   [ updated Oct 16, 2020, 1:09 AM ]

On Wednesday 14 October, Wienand Omta successfully defended his dissertation Knowledge Discovery in High Content Screening in Corona-proof hybrid style in the Academiegebouw, for which I was the co-promotor. His work in Big Data Analytics within the domain of High Content Screening (HCS) as a technology that allows life scientists to analyze the effect of bioactive molecules on cellular phenotypes, is perhaps now more important than ever before, as HCS technology is widely used in drug discovery projects, academia and the pharmaceutical industry, for example, to search for a potential COVID19 vaccin. Not only does Wienand's dissertation include various impact journal publications, such as the one on Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening, ever since 2012 he has also worked on the HC StratoMineR platform, for which his spin-off company Core Life Analytics recently secured a 1 M EUR Series A investment. The future is bright!

TALKS: Self-Service Data Science @HEALTHINF 2020

posted Feb 29, 2020, 6:50 AM by Marco Spruit
The 13th International Health Informatics (HealthInf 2020) conference took place in Valetta, and started – interestingly – with a 90 minutes long panel. The topic was on the undeniable gap between research and development, and, even worse yet, between development and operation: this is the "long mile" between research and medical practice that separates our best solutions from also becoming best practices and from achieving lasting impact at the point-of-care, and on the patients' illness trajectories and outcome. “Has the time come to move from the technical and embrace a more socio-technical, holistic approach?”

Of the keynote speakers in the panel, Helena Canhão introduced her Patient innovation project which focuses on patient entrepreneurship and has already collected 1000 innovations, however, many of them have not yet passed regulatory procedures to ensure patient safety. Roy Huddle specialises in visual analytics which helps explain how AI works (XAI) and can be considered a key tool to develop Trust in combination with using open data, open AI models, and external validation. Silvana Quaglini highlighted the role of the attitude of the medical professionals and the need for educating next generations of healthcare professionals to increase understanding and thus Trust in decision support systems and AI technologies. Finally, Federico Cabitza explained the gap between research and practice in more depth, citing some interesting works as well, with titles such as "The Last Mile: Where Artificial Intelligence Meets Reality", "Artificial Intelligence in Health Care: Will the Value Match the Hype?", and "The proof of the pudding: in praise of a culture of real-world validation for medical artificial intelligence". Unfortunately, at least within the regular programme, there were hardly any actual presentations on this key topic, once again illustrating the urgency of this viewpoint...

On a personal note, I presented our poster on Self-Service Data Science for Healthcare Professionals, which addresses this gap between research and practice by supporting the physicians in doing the data analysis themselves, as much as possible, capitalising upon the idea of "Trust Through Empowerment".

VACANCY: PhD position in Personalised Cybersecurity Risk Measurement

posted Feb 20, 2020, 4:32 AM by Marco Spruit   [ updated Mar 2, 2020, 2:18 PM ]

Fulltime PhD student or postdoc position in Personalised Cybersecurity Risk Measurement at Utrecht University

Job description

The Applied Data Science lab at Utrecht University (UU) seek to appoint a full-time and fully funded PhD student for the 4.8M EUR Horizon2020 EU project “GEIGER: The Geiger Cybersecurity Counter” on the topic “Digital Security and privacy for citizens and Small and Medium Enterprises and Micro Enterprises” (SU-DS03-2019-2020). This project builds in part upon the achievements of the SMESEC Horizon2020 EU project.
NB: We also invite qualified postdoc researchers in cybersecurity and data science to apply for a 2.5 years appointment.

The GEIGER project consists of 19 partners who will collaboratively develop an innovative solution with associated components and an Education Ecosystem addressing security, privacy and data protection risks of and for Small and Medium-sized Enterprises and Micro-enterprises (SMEs & MEs) in Europe. GEIGER will be developed in analogy of a GEIGER counter for detecting atomic radiation threatening human life. The GEIGER solution will be used for assessing, monitoring, and forecasting risks and reducing these risks by improving the SMEs’ & MEs’ security with well-curated tools, and an education program targeting practitioners-in-practice as “Certified Security Defenders” bringing security expertise sustainably to SMEs&MEs using existing vocational education frameworks.

At its core GEIGER consists of a GEIGER Indicator that dynamically summarizes the current level of risk by evaluating measures undertaken for security defences among the participating SMEs & MEs. The GEIGER Indicator can be personalised by registering the enterprise’s profile and supports GDPR-compliant sharing and exchanging data about incidents. The GEIGER Toolbox allows stepwise doit-yourself assessment and improvement of the SMEs’ & MEs’ security, privacy, and data protection with lightweight controls and advice for improved protection at varied levels of sophistication. The included tools offer endpoint, server, and network protection and guide the SME&ME in a personalised manner in data hygiene, including access and security control, data privacy management, and backup practices.

The GEIGER Education Ecosystem offers experimental-based training and cyber range-enabled challenges and will be integrated into curricula of diverse professions of non-ICT experts, offering direct impact on SMEs&MEs through target group-oriented education. The GEIGER solution will be demonstrated in three complementary use cases within three countries. GEIGER will achieve sustainable impact by raising awareness of more than one million SMEs&MEs within a period of 2.5 years after start.

The PhD student’s main tasks are to design and develop the personalised GEIGER Indicator, to co-develop a Cybersecurity knowledge graph relating all knowledge within security standards, and to lead the evaluation of the GEIGER solution in the three extensive use cases.


  1. The candidate should have (before 1 June 2020) an MSc in Data Science, Computer Science, Information Science, Information Security, Cybersecurity, Artifical Intelligence, Computational Linguistics, or other relevant area.
  2. You have excellent programming skills (at least in Python).
  3. You also have good English language skills in both academic writing and presentation.
  4. You are a team player, comfortable with working in a complex project involving multidisciplinary colleagues in different research groups.
  5. (For postdocs only:) You have a track record of publications in impact journals.
In a nutshell, you are ideally proficient in and enthousiastic about
  • Data Science and Natural Language Processing -- from personalised metric design to component implementation utilising, integrating and benchmarking various data and knowledge sources;
  • Cybersecurity -- assessing practices and standardisation processes; and
  • Pursuing European societal impact -- educating domain professionals in cybersecurity such as SME apprentices, accountants, and start-ups.

Additional information

For more information, please contact Dr. Marco Spruit, associate professor Applied Data Science (UU), m.r.spruit AT uu DOT nl.

Please note that the GEIGER project and, therefore, this position will commence on June 1, 2020. You need not apply if you are still unavailable after this fixed start date. However, if you think you qualify and are interested, or if you think you know someone who may be qualified and interested in pursuing a PhD, please contact us for more information.

Applications should address each of the criteria mentioned under qualifications, and include the following documents:
  • cover letter;
  • curriculum vitae;
  • copy of a recent publication;
  • copy of relevant (PhD/MSc/MA) diplomas and grades.
Please do not submit your application by email but use the offical UU application link instead.

The application deadline is Thursday 28 March 2020.

1-10 of 187