Rethinking Synthetic Data definitions: A privacy driven approach

Vallevik, Vibeke Binz, Marshall, Serena Elizabeth, Babic, Aleksandar, Nygaard, Jan Franz

Mar-5-2025–arXiv.org Artificial Intelligence

Synthetic data is emerging as a cost-eective solution necessary to meet the increasing data demands of AI development and can be generated either from existing knowledge or derived from real data. The traditional classification of synthetic data into hybrid, partial or fully synthetic datasets has limited value and does not reflect the ever-increasing methods to generate synthetic data. The characteristics of synthetic data are greatly shaped by the generation method and their source, which in turn determines its practical applications. We suggest a dierent approach to grouping synthetic data types that better reflect privacy perspectives. This is a crucial step towards improved regulatory guidance in the generation and processing of synthetic data. This approach to classification provides flexibility to new advancements like deep generative methods and oers a more practical framework for future applications.

artificial intelligence, machine learning, synthetic data, (16 more...)

arXiv.org Artificial Intelligence

Mar-5-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Norway (0.15)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining (0.71)