News‎ > ‎

Shaheen Syed: From MSc to Dr

posted Mar 21, 2019, 5:43 AM by Marco Spruit   [ updated Dec 6, 2019, 8:08 AM ]
Yesterday was a great day: Shaheen Syed successfully defended his dissertation titled Topic Discovery from Textual Data: Machine Learning and Natural Language Processing for Knowledge Discovery in the Fisheries Domain in the Academiegebouw. In my opinion, a solid reference work for Utrecht University's Applied Data Science focus area research, especially related to Natural Language Processing applications and foundational research as envisioned by the Special Interest Group (SIG) Text Mining. Here are some key bits from this work.

The main research question in this thesis is: How can we improve the knowledge discovery process from textual data through latent topical perspectives? The first three chapters of this thesis seek to understand how different types of textual data, pre-processing steps, and hyper-parameter settings of probabilistic topic models affect the quality of the derived latent topics. The remaining three chapters are aimed at the interpretation of the latent topics and how such (raw) latent topics can be turned into useful (fisheries) domain knowledge. Throughout this thesis, and within each chapter, specific phases of the KDD process are covered. Combined, they provide guidelines on how to optimize the knowledge discovery process with the aim to understand the latent topical content of scientific publications better.

This work was funded by Horizon2020 Marie Skłodowska-Curie – ITN - ETN grant: SAF21.