AITopics | pate-gan

Collaborating Authors

pate-gan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

_NIPS21__GPATE

Yunhui Long

Neural Information Processing SystemsFeb-7-2026, 15:55:50 GMT

Evaluationon Image Inception Score (IS) 1.00 1.00 1.00 1.00 (Accuracy) on Image Datasets.table

artificial intelligence, g-pate, pate-gan, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Industry: Information Technology > Security & Privacy (0.50)

Technology: Information Technology > Artificial Intelligence (0.94)

Add feedback

A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport

Wu, Yuanyuan, Qin, Zhenlin, Ma, Zhenliang

arXiv.org Artificial IntelligenceOct-29-2025

Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, there is limited attention to comprehensive evaluation, leaving unclear how reliable, safe, and useful synthetic data truly are. Existing evaluations remain fragmented, typically limited to population-level representativeness or record-level privacy, without considering group-level variations or task-specific utility. To address this gap, we propose a Representativeness-Privacy-Utility (RPU) framework that systematically evaluates synthetic trip data across three complementary dimensions and three hierarchical levels (record, group, population). The framework integrates a consistent set of metrics to quantify similarity, disclosure risk, and practical usefulness, enabling transparent and balanced assessment of synthetic data quality. We apply the framework to benchmark twelve representative generation methods, spanning conventional statistical models, deep generative networks, and privacy-enhanced variants. Results show that synthetic data do not inherently guarantee privacy and there is no "one-size-fits-all" model, the trade-off between privacy and representativeness/utility is obvious. Conditional Tabular generative adversarial network (CTGAN) provide the most balanced trade-off and is suggested for practical applications. The RPU framework provides a systematic and reproducible basis for researchers and practitioners to compare synthetic data generation techniques and select appropriate methods in public transport applications.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.24375

Country: Europe > Sweden (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging

Ganev, Georgi, Annamalai, Meenatchi Sundaram Muthu Selva, De Cristofaro, Emiliano

arXiv.org Artificial IntelligenceJun-20-2024

Privacy-preserving synthetic data has been increasingly adopted to share data within and across organizations while reducing privacy risks. The intuition is to train a generative model on the real data, draw samples from the model, and create new (synthetic) data points. As the original data may contain sensitive and/or personal information, synthetic data can be vulnerable to membership/property inference, reconstruction attacks, etc. [6, 13, 25, 29, 30, 57]. Thus, models should be trained to satisfy robust definitions like Differential Privacy (DP) [19, 20], which bounds the privacy leakage from the synthetic data. Combining generative models with DP has been advocated for or deployed by government agencies [2, 31, 46, 62], regulatory bodies [60, 61], and non-profit organizations [48, 63].

dataset, implementation, synthetic data, (16 more...)

arXiv.org Artificial Intelligence

2406.13985

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine (0.97)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Understanding how Differentially Private Generative Models Spend their Privacy Budget

Ganev, Georgi, Xu, Kai, De Cristofaro, Emiliano

arXiv.org Artificial IntelligenceMay-18-2023

Generative models trained with Differential Privacy (DP) are increasingly used to produce synthetic data while reducing privacy risks. Navigating their specific privacy-utility tradeoffs makes it challenging to determine which models would work best for specific settings/tasks. In this paper, we fill this gap in the context of tabular data by analyzing how DP generative models distribute privacy budgets across rows and columns, arguably the main source of utility degradation. We examine the main factors contributing to how privacy budgets are spent, including underlying modeling techniques, DP mechanisms, and data dimensionality. Our extensive evaluation of both graphical and deep generative models sheds light on the distinctive features that render them suitable for different settings and tasks. We show that graphical models distribute the privacy budget horizontally and thus cannot handle relatively wide datasets while the performance on the task they were optimized for monotonically increases with more data. Deep generative models spend their budget per iteration, so their behavior is less predictable with varying dataset dimensions but could perform better if trained on more features. Also, low levels of privacy ($\epsilon\geq100$) could help some models generalize, achieving better results than without applying DP.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.10994

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

DP-SGD vs PATE: Which Has Less Disparate Impact on GANs?

Ganev, Georgi

arXiv.org Artificial IntelligenceNov-26-2021

Generative Adversarial Networks (GANs) are among the most popular approaches to generate synthetic data, especially images, for data sharing purposes. Given the vital importance of preserving the privacy of the individual data points in the original data, GANs are trained utilizing frameworks with robust privacy guarantees such as Differential Privacy (DP). However, these approaches remain widely unstudied beyond single performance metrics when presented with imbalanced datasets. To this end, we systematically compare GANs trained with the two best-known DP frameworks for deep learning, DP-SGD, and PATE, in different data imbalance settings from two perspectives -- the size of the classes in the generated synthetic data and their classification performance. Our analyses show that applying PATE, similarly to DP-SGD, has a disparate effect on the under/over-represented classes but in a much milder magnitude making it more robust. Interestingly, our experiments consistently show that for PATE, unlike DP-SGD, the privacy-utility trade-off is not monotonically decreasing but is much smoother and inverted U-shaped, meaning that adding a small degree of privacy actually helps generalization. However, we have also identified some settings (e.g., large imbalance) where PATE-GAN completely fails to learn some subparts of the training data.

imbalance, pate-gan, synthetic data, (11 more...)

arXiv.org Artificial Intelligence

2111.13617

Country: Europe > United Kingdom > England (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Robin Hood and Matthew Effects -- Differential Privacy Has Disparate Impact on Synthetic Data

Ganev, Georgi, Oprisanu, Bristena, De Cristofaro, Emiliano

arXiv.org Artificial IntelligenceSep-23-2021

Generative models trained using Differential Privacy (DP) are increasingly used to produce and share synthetic data in a privacy-friendly manner. In this paper, we set out to analyze the impact of DP on these models vis-a-vis underrepresented classes and subgroups of data. We do so from two angles: 1) the size of classes and subgroups in the synthetic data, and 2) classification accuracy on them. We also evaluate the effect of various levels of imbalance and privacy budgets. Our experiments, conducted using three state-of-the-art DP models (PrivBayes, DP-WGAN, and PATE-GAN), show that DP results in opposite size distributions in the generated synthetic data. More precisely, it affects the gap between the majority and minority classes and subgroups, either reducing it (a "Robin Hood" effect) or increasing it ("Matthew" effect). However, both of these size shifts lead to similar disparate impacts on a classifier's accuracy, affecting disproportionately more the underrepresented subparts of the data. As a result, we call for caution when analyzing or training a model on synthetic data, or risk treating different subpopulations unevenly, which might also lead to unreliable conclusions.

classifier, dataset, subgroup, (14 more...)

arXiv.org Artificial Intelligence

2109.11429

Country:

North America > United States > Texas (0.08)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.95)
Information Technology > Security & Privacy (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

Yang, Chao-Han Huck, Siniscalchi, Sabato Marco, Lee, Chin-Hui

arXiv.org Artificial IntelligenceApr-2-2021

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications. The AAE architecture allows us to obtain good synthetic speech leveraging upon a discriminative training of latent vectors. Such synthetic speech is used to build a privacy-preserving classifier when non-sensitive data is not sufficiently available in the public domain. This classifier follows the PATE scheme that uses an ensemble of noisy outputs to label the synthetic samples and guarantee $\varepsilon$-differential privacy (DP) on its derived classifiers. Our proposed framework thus consists of an AAE-based generator and a PATE-based classifier (PATE-AAE). Evaluated on the Google Speech Commands Dataset Version II, the proposed PATE-AAE improves the average classification accuracy by +$2.11\%$ and +$6.60\%$, respectively, when compared with alternative privacy-preserving solutions, namely PATE-GAN and DP-GAN, while maintaining a strong level of privacy target at $\varepsilon$=0.01 with a fixed $\delta$=10$^{-5}$.

accuracy, classification, privacy, (14 more...)

arXiv.org Artificial Intelligence

2104.01271

Country:

North America > United States > California (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Scalable Differentially Private Generative Student Model via PATE

Long, Yunhui, Lin, Suxin, Yang, Zhuolin, Gunter, Carl A., Li, Bo

arXiv.org Machine LearningJun-21-2019

Recent rapid development of machine learning is largely due to algorithmic breakthroughs, computation resource development, and especially the access to a large amount of training data. However, though data sharing has the great potential of improving machine learning models and enabling new applications, there have been increasing concerns about the privacy implications of data collection. In this work, we present a novel approach for training differentially private data generator G-PATE. The generator can be used to produce synthetic datasets with strong privacy guarantee while preserving high data utility. Our approach leverages generative adversarial nets (GAN) to generate data and protect data privacy based on the Private Aggregation of Teacher Ensembles (PATE) framework. Our approach improves the use of privacy budget by only ensuring differential privacy for the generator, which is the part of the model that actually needs to be published for private data generation. To achieve this, we connect a student generator with an ensemble of teacher discriminators. We also propose a private gradient aggregation mechanism to ensure differential privacy on all the information that flows from the teacher discriminators to the student generator. We empirically show that the G-PATE significantly outperforms prior work on both image and non-image datasets.

artificial intelligence, discriminator, machine learning, (17 more...)

arXiv.org Machine Learning

1906.09338

Genre:

Research Report (1.00)
Overview (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software (0.41)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback