Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Apr-24-2026, 13:53:54 GMT–Neural Information Processing Systems

Accessible high-quality data is the bread and butter of machine learning research,1 and the demand for data has exploded as larger and more advanced ML models are2 built across different domains. Yet, real data often contain sensitive information,3 subject to various biases, and are costly to acquire, which compromise their quality4 and accessibility. Synthetic data have thus emerged as a complement, sometimes5 even a replacement, to real data for ML training. However, the landscape of6 synthetic data research has been fragmented due to the large number of data7 modalities (e.g., tabular data, time series data, images, etc.) and various use cases8 (e.g., privacy, fairness, data augmentation, etc.). This poses practical challenges9 in comparing and selecting synthetic data generators in different problem settings.10 To this end, we develop Synthcity, an open-source Python library that allows11 researchers and practitioners to perform one-click benchmarking of synthetic data12 generators across data modalities and use cases. In addition, Synthcity's plug-in13 style API makes it easy to incorporate additional data generators into the framework.14 Beyond benchmarking, it also offers a single access point to a diverse range of15 cutting-edge data generators. Through examples on tabular data generation and16 data augmentation, we illustrate the general applicability of Synthcity, and the17 insight one can obtain.18

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Apr-24-2026, 13:53:54 GMT

Conferences PDF

Add feedback

Genre:
- Research Report (0.68)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language (0.98)
    - Machine Learning > Neural Networks
      - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
09723c9f291f6056fd1885081859c186-Paper-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found