AITopics | sensitive dataset

Collaborating Authors

sensitive dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Private Evolution Converges

González, Tomás, Fanti, Giulia, Ramdas, Aaditya

arXiv.org Artificial IntelligenceNov-18-2025

Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to understand PE's practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a convex and compact domain, we prove that under the right hyperparameter settings and given access to the Gaussian variation API proposed in \cite{PE23}, PE produces an $(\varepsilon, δ)$-DP synthetic dataset with expected 1-Wasserstein distance $\tilde{O}(d(n\varepsilon)^{-1/d})$ from the original; this establishes worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in experiments.

artificial intelligence, histogram, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.08312

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis

Gong, Chen, Li, Kecen, Lin, Zinan, Wang, Tianhao

arXiv.org Artificial IntelligenceMar-18-2025

Differentially private (DP) image synthesis aims to generate artificial images that retain the properties of sensitive images while protecting the privacy of individual images within the dataset. Despite recent advancements, we find that inconsistent--and sometimes flawed--evaluation protocols have been applied across studies. This not only impedes the understanding of current methods but also hinders future advancements. To address the issue, this paper introduces DPImageBench for DP image synthesis, with thoughtful design across several dimensions: (1) Methods. We study eleven prominent methods and systematically characterize each based on model architecture, pretraining strategy, and privacy mechanism. (2) Evaluation. We include nine datasets and seven fidelity and utility metrics to thoroughly assess them. Notably, we find that a common practice of selecting downstream classifiers based on the highest accuracy on the sensitive test set not only violates DP but also overestimates the utility scores. DPImageBench corrects for these mistakes. (3) Platform. Despite the methods and evaluation protocols, DPImageBench provides a standardized interface that accommodates current and future implementations within a unified framework. With DPImageBench, we have several noteworthy findings. For example, contrary to the common wisdom that pretraining on public image datasets is usually beneficial, we find that the distributional similarity between pretraining and sensitive images significantly impacts the performance of the synthetic images and does not always yield improvements. In addition, adding noise to low-dimensional features, such as the high-level characteristics of sensitive images, is less affected by the privacy budget compared to adding noise to high-dimensional features, like weight gradients. The former methods perform better than the latter under a low privacy budget.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.14681

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Virginia (0.04)
Europe (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meticulously Selecting 1% of the Dataset for Pre-training! Generating Differentially Private Images Data with Semantics Query

Li, Kecen, Gong, Chen, Li, Zhixiang, Zhao, Yuzhong, Hou, Xinwen, Wang, Tianhao

arXiv.org Artificial IntelligenceOct-19-2023

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems of unstable training and massive computational resource demands. This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data, promoting the efficient creation of DP datasets with high fidelity and utility. PRIVIMAGE first establishes a semantic query function using a public dataset. Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training. Finally, we pre-train an image generative model using the selected data and then fine-tune this model on the sensitive dataset using Differentially Private Stochastic Gradient Descent (DP-SGD). PRIVIMAGE allows us to train a lightly parameterized generative model, reducing the noise in the gradient during DP-SGD training and enhancing training stability. Extensive experiments demonstrate that PRIVIMAGE uses only 1% of the public dataset for pre-training and 7.6% of the parameters in the generative model compared to the state-of-the-art method, whereas achieves superior synthetic performance and conserves more computational resources. On average, PRIVIMAGE achieves 30.1% lower FID and 12.6% higher Classification Accuracy than the state-of-the-art method. The replication package and datasets can be accessed online.

dataset, public dataset, sensitive dataset, (14 more...)

arXiv.org Artificial Intelligence

2311.1285

Country:

North America > United States > Virginia (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Communications > Web > Semantic Web (0.71)

Add feedback

DAMIA: Leveraging Domain Adaptation as a Defense against Membership Inference Attacks

Huang, Hongwei, Luo, Weiqi, Zeng, Guoqiang, Weng, Jian, Zhang, Yue, Yang, Anjia

arXiv.org Artificial IntelligenceMay-16-2020

Deep Learning (DL) techniques allow ones to train models from a dataset to solve tasks. DL has attracted much interest given its fancy performance and potential market value, while security issues are amongst the most colossal concerns. However, the DL models may be prone to the membership inference attack, where an attacker determines whether a given sample is from the training dataset. Efforts have been made to hinder the attack but unfortunately, they may lead to a major overhead or impaired usability. In this paper, we propose and implement DAMIA, leveraging Domain Adaptation (DA) as a defense aginist membership inference attacks. Our observation is that during the training process, DA obfuscates the dataset to be protected using another related dataset, and derives a model that underlyingly extracts the features from both datasets. Seeing that the model is obfuscated, membership inference fails, while the extracted features provide supports for usability. Extensive experiments have been conducted to validates our intuition. The model trained by DAMIA has a negligible footprint to the usability. Our experiment also excludes factors that may hinder the performance of DAMIA, providing a potential guideline to vendors and researchers to benefit from our solution in a timely manner.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2005.08016

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback