Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead
Schlegel, Viktor, Bharath, Anil A, Zhao, Zilong, Yee, Kevin
–arXiv.org Artificial Intelligence
Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive framework for understanding the landscape of privacy-preserving synthetic data, presenting the theoretical foundations of generative models and differential privacy followed by a review of state-of-the-art methods across tabular data, images, and text. Our synthesis of evaluation approaches highlights the fundamental trade-off between utility for down-stream tasks and privacy guarantees, while identifying critical research gaps: the lack of realistic benchmarks representing specialized domains and insufficient empirical evaluations required to contextualise formal guarantees. Through empirical analysis of four leading methods on five real-world datasets from specialized domains, we demonstrate significant performance degradation under realistic privacy constraints ($\epsilon \leq 4$), revealing a substantial gap between results reported on general domain benchmarks and performance on domain-specific data. %Our findings highlight key challenges including unaccounted privacy leakage, insufficient empirical verification of formal guarantees, and a critical deficit of realistic benchmarks. These challenges underscore the need for robust evaluation frameworks, standardized benchmarks for specialized domains, and improved techniques to address the unique requirements of privacy-sensitive fields such that this technology can deliver on its considerable potential.
arXiv.org Artificial Intelligence
Mar-26-2025
- Country:
- North America
- Montserrat (0.04)
- United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.04)
- Indiana > Marion County
- Indianapolis (0.04)
- California > San Diego County
- San Diego (0.04)
- Pennsylvania > Philadelphia County
- Europe > Belgium
- Brussels-Capital Region > Brussels (0.04)
- Asia
- Singapore (0.04)
- Middle East > Jordan (0.04)
- North America
- Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.86)
- Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.92)
- Banking & Finance (0.67)
- Health & Medicine
- Therapeutic Area (1.00)
- Health Care Technology (0.92)
- Diagnostic Medicine > Imaging (0.92)
- Technology:
- Information Technology
- Security & Privacy (1.00)
- Knowledge Management (1.00)
- Information Management (1.00)
- Communications (1.00)
- Data Science > Data Mining
- Big Data (0.86)
- Artificial Intelligence
- Representation & Reasoning (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Performance Analysis > Accuracy (1.00)
- Neural Networks > Deep Learning (1.00)
- Learning Graphical Models (0.67)
- Information Technology