On the role of data anonymization in machine learning privacy
2020 (English)In: Proceedings - 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020 / [ed] Guojun Wang, Ryan Ko, Md Zakirul Alam Bhuiyan, Yi Pan, IEEE, 2020, p. 664-675Conference paper, Published paper (Refereed)
Abstract [en]
Data anonymization irrecoverably transforms the raw data into a protected version by eliminating direct identifiers and removing sufficient details from indirect identifiers in order to minimize the risk of re-identification when there is a requirement for data publishing. Nevertheless, data protection laws (i.e., GDPR) do not consider anonymized data as personal data thus allowing them to be freely used, analysed, shared and monetized without a compliance risk. Motivated by the above advantages, it is plausible that the data controllers anonymize the data before releasing them for any data analysis tasks such as machine learning (ML); which is applied in a wide variety of domains where personal data are used. Moreover, in recent research, it has shown that ML models are vulnerable to privacy attacks as they retain sensitive information from the training data. Taking all of these facts into consideration, in this work we explore the interplay between data anonymization and ML with the ultimate aim of clarifying whether data anonymization is sufficient to achieve privacy for ML under different adversarial scenarios. We also discuss the challenges and opportunities of integrating these two domains. As per our findings, it is conspicuous that in order to substantially minimize the privacy risks in ML, existing data anonymization techniques have to be applied with high privacy levels that cause a deterioration in model utility.
Place, publisher, year, edition, pages
IEEE, 2020. p. 664-675
Series
IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), ISSN 2324-898X, E-ISSN 2324-9013
Keywords [en]
Data anonymization, Data privacy, Privacy preserving machine learning, Deterioration, Machine learning, Data controllers, Data protection laws, Data publishing, Privacy Attacks, Re identifications, Recent researches, Sensitive informations, Privacy by design
National Category
Computer Sciences Computer Systems
Research subject
Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-19522DOI: 10.1109/TrustCom50675.2020.00093ISI: 000671077600079Scopus ID: 2-s2.0-85101295825ISBN: 978-0-7381-4380-4 (electronic)ISBN: 978-0-7381-4381-1 (print)OAI: oai:DiVA.org:his-19522DiVA, id: diva2:1534134
Conference
2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020, 29 December 2020 – 1 January 2021, Guangzhou, China
Part of project
Disclosure risk and transparency in big data privacy, Swedish Research Council
Funder
Swedish Research Council, 2016-03346
Note
© 2020 IEEE.
This work is supported by Vetenskapsrådet project: “Disclosure risk and transparency in big data privacy” (VR 2016-03346, 2017-2020).
DRIAT
2021-03-042021-03-042021-08-20Bibliographically approved