Please refer to my Research Projects page for active projects in this theme.

Analytic Systems

ResearchAs principle investigator in the department's Applied Data Science Lab, my research line centres around Analytic Systems in an Applied Data Science context with a special focus on Health Analytic Systems. In my research theme I investigate utility determinants of analytic systems in daily practices from a people-process-technology perspective. One novel aspect is its specific aim to collect these measurements from daily practices instead of computational experiments, which implies a significant software prototype engineering effort before an analytic system’s utility can actually be determined. For example, Meulendijk, Spruit, et al. (2015) evaluate the STRIP Assistant’s usability as an analytic system for physicians to optimise medical records for polypharmacy patients by jointly measuring its effectiveness, efficiency and user satisfaction.

Applied Data Science

I formalised these research objectives regarding Analytic Systems in Applied Data Science in (Spruit & Jagesar, 2016) which defines Applied Data Science as “the knowledge discovery process in which analytical applications are designed and evaluated to improve the daily practices of domain experts”, in contrast to more fundamental data science which primarily aims to develop novel statistical and machine learning techniques for performing Data Science. Nevertheless, novel applications of data science methodology and engineering to a particular scientific domain likely result in new fundamental data science research questions, in line with the UPADS (2017) Starting document.

Information Infrastructure

Two embedded research strands within my Analytic Systems research line focus on Information Infrastructure and Text Analytics. The research theme Information Infrastructure investigates, in its broadest sense, "the technical, social, and political framework that encompasses the people, technology, tools, and services used to facilitate the distributed, collaborative use of content over time and distance" (Borgman, 2010:19). An Information Infrastructure can refer to either a strictly structured datawarehouse or a loosely coupled big data lake. I consider aspects such as interoperability versus uniformity, data quality versus usability, as well as standardisation versus situationality (e.g. Hanseth et al., 1996). For example, Dijk, Bargh, Choenni & Spruit (2017) describe an innovative data quality resolving architecture, whereas Shen, Meulendijk & Spruit (2016) present a federated information architecture for multinational clinical trials. Moreover, this research strand includes the design of managerial artefacts to evaluate and improve their Information Infrastructure in daily practices such as maturity models for incremental process improvement (e.g. Baars et al., 2016).

Design Science

I primarily apply a Design Science research approach in which an analytic system prototype functions as a research intervention instrument. The prototype is used to evaluate the Design Science artefact under development (e.g. a method, model, process, framework, or architecture), employing metrics such as effectiveness, efficiency and usability to determine the analytic system’s societal impact. I refer to Prat, Comyn-Wattiau & Akoka (2014) for a complete overview of relevant artefact evaluation metrics.

Meta-algorithmic modelling

The research artefacts can be modelled using Meta-algorithmic modelling which I define as “an engineering discipline where sequences of algorithm selection and configuration activities are specified deterministically for performing analytical tasks based on problem-specific data input characteristics and process preferences” (Spruit & Jagesar, 2016).

Text Analytics

The research theme Text Analytics investigates natural language processing (NLP) systems in daily practices from a people-process-technology perspective. The following thee examples may exemplify this type of research. In Spruit & Vlug (2015) we present a text snippet enrichment process to improve the classification of financial transactions. Menger, Scheepers, Wijk, & Spruit (in press) develop a pattern matching method for automatic de-identification of Dutch medical text. To conclude, in Syed & Spruit (in press) we examine the nature of the input data parameter within the context of the Latent Dirichlet Allocation (LDA) algorithm for the topic modelling task within NLP.