Synthetic Data Does Not Reliably Protect Privacy, Researchers Claim
A new research collaboration between France and the UK casts doubt on growing industry confidence that synthetic data can resolve the privacy, quality and availability issues (among other issues) that threaten progress in the machine learning sector. Among several key points addressed, the authors assert that synthetic data modeled from real data retains enough of the genuine information as to provide no reliable protection from inference and membership attacks, which seek to deanonymize data and re-associate it with actual people. Furthermore, the individuals most at risk from such attacks, including those with critical medical conditions or high hospital bills (in the case of medical record anonymization) are, through the'outlier' nature of their condition, most likely to be re-identified by these techniques. 'Given access to a synthetic dataset, a strategic adversary can infer, with high confidence, the presence of a target record in the original data.' The paper also notes that differentially private synthetic data, which obscures the signature of individual records, does indeed protect individuals' privacy, but only by significantly crippling the usefulness of the information retrieval systems that use it.
Sep-26-2021, 01:55:38 GMT
- Country:
- Europe
- France (0.25)
- Switzerland > Vaud
- Lausanne (0.05)
- North America > United States
- Texas (0.06)
- Europe
- Industry: