AITopics | synthcity

Collaborating Authors

synthcity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Neural Information Processing SystemsApr-24-2026, 13:53:54 GMT

Accessible high-quality data is the bread and butter of machine learning research,1 and the demand for data has exploded as larger and more advanced ML models are2 built across different domains. Yet, real data often contain sensitive information,3 subject to various biases, and are costly to acquire, which compromise their quality4 and accessibility. Synthetic data have thus emerged as a complement, sometimes5 even a replacement, to real data for ML training. However, the landscape of6 synthetic data research has been fragmented due to the large number of data7 modalities (e.g., tabular data, time series data, images, etc.) and various use cases8 (e.g., privacy, fairness, data augmentation, etc.). This poses practical challenges9 in comparing and selecting synthetic data generators in different problem settings.10 To this end, we develop Synthcity, an open-source Python library that allows11 researchers and practitioners to perform one-click benchmarking of synthetic data12 generators across data modalities and use cases. In addition, Synthcity's plug-in13 style API makes it easy to incorporate additional data generators into the framework.14 Beyond benchmarking, it also offers a single access point to a diverse range of15 cutting-edge data generators. Through examples on tabular data generation and16 data augmentation, we illustrate the general applicability of Synthcity, and the17 insight one can obtain.18

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Debiasing Synthetic Data Generated by Deep Generative Models

Neural Information Processing SystemsFeb-12-2026, 11:16:11 GMT

While synthetic data hold great promise for privacy protection, their statistical analysis poses significant challenges that necessitate innovative solutions.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

09723c9f291f6056fd1885081859c186-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-7-2026, 15:07:27 GMT

However, the landscape of6 synthetic data research has been fragmented due to the large number of data7 modalities(e.g.,tabulardata,timeseriesdata,images,etc.) andvarioususecases8 (e.g., privacy, fairness, data augmentation, etc.). Beyond benchmarking, it also offers a single access point to a diverse range of15 cutting-edge data generators.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: South America > Brazil (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Neural Information Processing SystemsDec-23-2025, 20:07:31 GMT

Accessible high-quality data is the bread and butter of machine learning research, and the demand for data has exploded as larger and more advanced ML models are built across different domains. Yet, real data often contain sensitive information, are subject to various biases, and are costly to acquire, which compromise their quality and accessibility. Synthetic data have thus emerged as a complement to, sometimes even a replacement for, real data for ML training. However, the landscape of synthetic data research has been fragmented due to the diverse range of data modalities, such as tabular, time series, and images, and the wide array of use cases, including privacy preservation, fairness considerations, and data augmentation. This fragmentation poses practical challenges when comparing and selecting synthetic data generators in for different problem settings. To this end, we develop Synthcity, an open-source Python library that allows researchers and practitioners to perform one-click benchmarking of synthetic data generators across data modalities and use cases. Beyond benchmarking, Synthcity serves as a centralized toolkit for accessing cutting-edge data generators. In addition, Synthcity's flexible plug-in style API makes it easy to incorporate additional data generators into the framework. Using examples of tabular data generation and data augmentation, we illustrate the general applicability of Synthcity, and the insight one can obtain.

benchmark framework, diverse use case, synthcity, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.76)
Information Technology > Software (0.59)

Add feedback

Debiasing Synthetic Data Generated by Deep Generative Models

Neural Information Processing SystemsOct-10-2025, 01:23:15 GMT

While synthetic data hold great promise for privacy protection, their statistical analysis poses significant challenges that necessitate innovative solutions.

dataset, estimator, synthetic data, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Debiasing Synthetic Data Generated by Deep Generative Models

Decruyenaere, Alexander, Dehaene, Heidelinde, Rabaey, Paloma, Polet, Christiaan, Decruyenaere, Johan, Demeester, Thomas, Vansteelandt, Stijn

arXiv.org Machine LearningNov-6-2024

While synthetic data hold great promise for privacy protection, their statistical analysis poses significant challenges that necessitate innovative solutions. The use of deep generative models (DGMs) for synthetic data generation is known to induce considerable bias and imprecision into synthetic data analyses, compromising their inferential utility as opposed to original data analyses. This bias and uncertainty can be substantial enough to impede statistical convergence rates, even in seemingly straightforward analyses like mean calculation. The standard errors of such estimators then exhibit slower shrinkage with sample size than the typical 1 over root-$n$ rate. This complicates fundamental calculations like p-values and confidence intervals, with no straightforward remedy currently available. In response to these challenges, we propose a new strategy that targets synthetic data created by DGMs for specific data analyses. Drawing insights from debiased and targeted machine learning, our approach accounts for biases, enhances convergence rates, and facilitates the calculation of estimators with easily approximated large sample variances. We exemplify our proposal through a simulation study on toy data and two case studies on real-world data, highlighting the importance of tailoring DGMs for targeted data analysis. This debiasing strategy contributes to advancing the reliability and applicability of synthetic data in statistical inference.

dataset, estimator, synthetic data, (16 more...)

arXiv.org Machine Learning

2411.04216

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Neural Information Processing SystemsOct-9-2024, 13:34:22 GMT

data generator, synthcity, use case, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging

Ganev, Georgi, Annamalai, Meenatchi Sundaram Muthu Selva, De Cristofaro, Emiliano

arXiv.org Artificial IntelligenceJun-20-2024

Privacy-preserving synthetic data has been increasingly adopted to share data within and across organizations while reducing privacy risks. The intuition is to train a generative model on the real data, draw samples from the model, and create new (synthetic) data points. As the original data may contain sensitive and/or personal information, synthetic data can be vulnerable to membership/property inference, reconstruction attacks, etc. [6, 13, 25, 29, 30, 57]. Thus, models should be trained to satisfy robust definitions like Differential Privacy (DP) [19, 20], which bounds the privacy leakage from the synthetic data. Combining generative models with DP has been advocated for or deployed by government agencies [2, 31, 46, 62], regulatory bodies [60, 61], and non-profit organizations [48, 63].

dataset, implementation, synthetic data, (16 more...)

arXiv.org Artificial Intelligence

2406.13985

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine (0.97)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Qian, Zhaozhi, Cebere, Bogdan-Constantin, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceJan-18-2023

Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub and pip. We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.07573

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.65)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Add feedback