MS2012-05: Data quality awareness in privacy-preserving data integration and sharing in the public sector

posted Apr 5, 2012, 12:26 AM by Marco Spruit   [ updated Nov 20, 2012, 1:59 AM ]
For public organizations data integration and sharing are important in delivering better services. However, when sensitive data are integrated and shared, privacy protection and information security become key issues. This means that information systems should be secured and that access to sensitive data should be controlled. This assignment mainly focuses on data sharing that requires one-on-one mapping of individuals. There are several approaches to perform one-on-one mapping of individuals.

Traditionally, organizations collect all data neccessary for data integration and perform data integration themselves. This way, organizations have full control over the integration methods used. This can be useful when one-on-one mapping cannot be done on identifying keys, and therefore a set of discriminating properties (names, birth dates) has to be collected. To find the best alternative, organizations collect as much as personal information as needed to identify false positives or false negatives.

A recent approach is the introduction of a trusted third party (TTP) that manages access control to personal information and thus helps to protect the privacy of individuals. In this case, matching has to be done with just as much information as needed. However, measures taken for security and to preserve privacy can potentially cause lower data quality after matching, when the matching authority lacks information to identify false positives or false negatives. This leads to the main question of this assignment: 

How can be guaranteed that privacy-preserving measures do not result in significant deviations in analysis on the integrated dataset, compared to data integration without privacy-reserving measures?


To research this question several subquestions can be formulated. For example: is it possible to predict the effects of privacy-preserving measures on the data quality after matching?  Which characteristics must data have to ensure that privacy-measures do not result in significant deviations in analysis on the integrated dataset, compared to data integration without privacy-preserving measures?


A framework for data quality aware usage of privacy-preserving measures and TTPs, supported by one or more case studies. 

This is a internship at the Research and Documentation Centre (WODC) of the Ministry of Security and Justice in The Hague. Case studies are likely to be performed at external organizations.

For more info, contact Marco.