Machine learning (ML) models trained on sensitive data pose a distinct threat to privacy with the emergence of numerous threat models exploiting their privacy vulnerabilities.Therefore, privacy preserving machine learning (PPML) has gained an increased attentionover the past couple of years. Existing PPML techniques introduced in the literatureare mainly based on differential privacy or cryptography based techniques. Respectivelythey are criticized for the poor predictive accuracy of the derived ML models and for theextensive computational cost. Moreover, they operate under the assumption that originaldata are always available for training the ML models. However, there exist scenarioswhere anonymized data are available instead of the original data. Anonymization ofsensitive data is required before publishing them in order to preserve the privacy of theunderlying data subjects. Nevertheless, there are valid organizational and legal requirementsfor data publishing. In this case, it is important to understand the impact of dataanonymization on ML in general and how this can be used as a stepping stone towardsPPML.The proposed research is aimed at understanding the opportunities and challenges forPPML in the context of data anonymization, and to address them effectively by developinga unified solution to serve the objectives of both data anonymization and PPML.
Research proposal, PhD programme, University of Skövde