Description
Introduction to Personal data anonymization essential aspects: formats, techniques, and process. Finally, we summarize how data anonymization affects Machine Learning models
Summary
- Data anonymization is the alteration process of personally identifiable information (PII) in a dataset, to protect individual identification.
- The values that are suppressed are those with few appearances in the original dataset due to the fact that they represent a high disclosure risk for those records that contain them.
- the original dataset is blended with a fully synthetic one.
- If we don’t manage to figure out how to build Machine Learning systems that have good security properties and that protect the privacy of information, that would really limit the usefulness of Machine Learning for many applications that we care about” Martín Abadi, Google’s researcher, stated this at the Khipu’s conference in 2019 while delivering an excellent overview of Privacy and Security in Machine Learning.