News‎ > ‎

EDU: Data Science & Society 2018

posted Nov 13, 2018, 6:50 AM by Marco Spruit   [ updated Nov 13, 2018, 7:21 AM ]
The Applied Data Science Lab just finished teaching the Data Science & Society course for 120 students. We revised the course significantly, such that it captures the research fields as shown with their interdependent relationships in the conceptual Venn diagramme.

In a nutshell, illustrative of applied data science research, we regularly focused on relevant questions in a number of data science application domains including neonatology, epidemiology, geoscience, marketing, psychiatry, cell biology, ethics & privacy, through a series of guest lectures. Thus, students can better understand the role of data science and its societal impact (ILO1). Next, students apply the CRISP-DM Knowledge Discovery Process in both lectures and many workshop sessions, also with special attention to methodological issues in Big Data analyses like p-value interpretation, multiple testing, replicability, overfitting, and construct validity. This learns students to recognise the knowledge discovery processes in applied data science (ILO2). Throughout the course we maintained a Big Data focus, operationalised in a popular data science book review assignment, clarifying the particularities of big data in relation to datawarehousing, SQL vs NoSQL, and ethical and privacy implications. Hereby we help students identify trends and developments in big data technologies (ILO3). The Cloud Computing focus amply provides a thorough engineering component by utilising MS Azure as the Infrastructure-as-a-Service environment. Every student worked individually on their own personal Virtual Machine on weekly Hadoop and Spark assignments with real data and real research questions within an MS DevTest Labs context, mostly on Data Science Virtual Machine (DSVM) images. Thereby, students actually apply selected big data technologies to solve real-world problems (ILO4). All these tasks are performed to prepare students to help empower domain experts run their own analyses, possibly by using pretrained models and APIs to help realise our services computing-compatible vision of self-service data science.

We concluded the course with an online Remindo final exam which consisted of 85 multiple choice questions with the following resulting statistics as reported in Remindo:
We are quite content with the results, as the exam was intended to be more thorough than the Remindo midterm exam with 95 questions (which scored significantly higher grades). It is clear that the results are highly normally distributed, with a good Cronbach's alfa score of >0.80. Must be a decent assessment, then!