A Consensus Privacy Metrics Framework for Synthetic Data
Pilgram, Lisa, Dankar, Fida K., Drechsler, Jorg, Elliot, Mark, Domingo-Ferrer, Josep, Francis, Paul, Kantarcioglu, Murat, Kong, Linglong, Malin, Bradley, Muralidhar, Krishnamurty, Myles, Puja, Prasser, Fabian, Raisaro, Jean Louis, Yan, Chao, Emam, Khaled El
–arXiv.org Artificial Intelligence
Synthetic data generation is one approach for sharing individual-level data. However, to meet legislative requirements, it is necessary to demonstrate that the individuals' privacy is adequately protected. There is no consolidated standard for measuring privacy in synthetic data. Through an expert panel and consensus process, we developed a framework for evaluating privacy in synthetic data. Our findings indicate that current similarity metrics fail to measure identity disclosure, and their use is discouraged. For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable. There was consensus on the importance of membership and attribute disclosure, both of which involve inferring personal information about an individual without necessarily revealing their identity. The resultant framework provides precise recommendations for metrics that address these types of disclosures effectively. Our findings further present specific opportunities for future research that can help with widespread adoption of synthetic data.
arXiv.org Artificial Intelligence
Mar-6-2025
- Country:
- Asia
- Middle East > Israel (0.04)
- Singapore (0.04)
- South Korea (0.04)
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany
- Bavaria
- Middle Franconia > Nuremberg (0.04)
- Upper Bavaria > Munich (0.04)
- Berlin (0.14)
- Bavaria
- Italy (0.04)
- Netherlands (0.14)
- Spain
- Andalusia > Córdoba Province
- Córdoba (0.04)
- Catalonia > Tarragona Province
- Tarragona (0.04)
- Andalusia > Córdoba Province
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Greater Manchester > Manchester (0.04)
- Denmark > Capital Region
- North America
- Canada
- Alberta (0.14)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario
- National Capital Region > Ottawa (0.04)
- Toronto (0.04)
- United States
- California (0.04)
- Washington (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- District of Columbia > Washington (0.04)
- Virginia (0.04)
- Vermont (0.04)
- Oklahoma (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Tennessee > Davidson County
- Nashville (0.04)
- New York
- Monroe County > Rochester (0.04)
- New York County > New York City (0.04)
- Maryland > Montgomery County
- Rockville (0.04)
- New Mexico > Los Alamos County
- Los Alamos (0.04)
- Maine (0.04)
- Texas > Travis County
- Austin (0.04)
- Canada
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Government > Regional Government
- Health & Medicine
- Consumer Health (0.92)
- Epidemiology (0.93)
- Health Care Providers & Services (1.00)
- Health Care Technology > Medical Record (0.68)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Immunology > HIV (0.93)
- Infections and Infectious Diseases (1.00)
- Internal Medicine (0.68)
- Oncology (0.92)
- Information Technology > Security & Privacy (1.00)
- Law > Civil Rights & Constitutional Law (0.92)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Performance Analysis > Accuracy (1.00)
- Statistical Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Machine Learning
- Communications (1.00)
- Data Science > Data Mining (1.00)
- Information Management (1.00)
- Modeling & Simulation (1.00)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology