Rethinking Synthetic Data definitions: A privacy driven approach
Vallevik, Vibeke Binz, Marshall, Serena Elizabeth, Babic, Aleksandar, Nygaard, Jan Franz
–arXiv.org Artificial Intelligence
Synthetic data is emerging as a cost-eective solution necessary to meet the increasing data demands of AI development and can be generated either from existing knowledge or derived from real data. The traditional classification of synthetic data into hybrid, partial or fully synthetic datasets has limited value and does not reflect the ever-increasing methods to generate synthetic data. The characteristics of synthetic data are greatly shaped by the generation method and their source, which in turn determines its practical applications. We suggest a dierent approach to grouping synthetic data types that better reflect privacy perspectives. This is a crucial step towards improved regulatory guidance in the generation and processing of synthetic data. This approach to classification provides flexibility to new advancements like deep generative methods and oers a more practical framework for future applications.
arXiv.org Artificial Intelligence
Mar-5-2025