AITopics | privacy profile

Collaborating Authors

privacy profile

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Privacy amplification by random allocation

Neural Information Processing SystemsJun-19-2026, 12:32:09 GMT

We consider the privacy amplification properties of a sampling scheme in which a user's data is used in k steps chosen randomly and uniformly from a sequence (or set) of t steps. This sampling scheme has been recently applied in the context of differentially private optimization [Chua et al., 2024a, Choquette-Choo et al., 2025] and is also motivated by communication-efficient high-dimensional private aggregation [Asi et al., 2025]. Existing analyses of this scheme either rely on privacy amplification by shuffling which leads to overly conservative bounds or require Monte Carlo simulations that are computationally prohibitive in most practical scenarios. We give the first theoretical guarantees and numerical estimation algorithms for this sampling scheme. In particular, we demonstrate that the privacy guarantees of random k-out-of-t allocation can be upper bounded by the privacy guarantees of the well-studied independent (or Poisson) subsampling in which each step uses the user's data with probability (1+o(1))k/t. Further, we provide two additional analysis techniques that lead to numerical improvements in several parameter regimes. Altogether, our bounds give efficiently-computable and nearly tight numerical results for random allocation applied to Gaussian noise addition.

allocation, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.45)
North America > United States (0.28)
Asia (0.27)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

Borja Balle, Gilles Barthe, Marco Gaboardi

Neural Information Processing SystemsFeb-12-2026, 15:50:43 GMT

Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called "privacy amplification by subsampling" principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have been studied for different random subsampling methods, each with an ad-hoc analysis. In this paper we present a general method that recovers and improves prior analyses, yields lower bounds and derives new instances of privacy amplification by subsampling. Our method leverages a characterization of differential privacy as a divergence which emerged in the program verification community. Furthermore, it introduces new tools, including advanced joint convexity and privacy profiles, which might be of independent interest.

artificial intelligence, machine learning, mechanism, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

Borja Balle, Gilles Barthe, Marco Gaboardi

Neural Information Processing SystemsNov-20-2025, 15:49:07 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, mechanism, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences

Ramírez, Guillem, Birch, Alexandra, Titov, Ivan

arXiv.org Artificial IntelligenceOct-21-2025

Large language models (LLMs) are primarily accessed via commercial APIs, but this often requires users to expose their data to service providers. In this paper, we explore how users can stay in control of their data by using privacy profiles: simple natural language instructions that say what should and should not be revealed. We build a framework where a local model uses these instructions to rewrite queries, only hiding details deemed sensitive by the user, before sending them to an external model, thus balancing privacy with performance. To support this research, we introduce PEEP, a multilingual dataset of real user queries annotated to mark private content and paired with synthetic privacy profiles. Experiments with lightweight local LLMs show that, after fine-tuning, they not only achieve markedly better privacy preservation but also match or exceed the performance of much larger zero-shot models. At the same time, the system still faces challenges in fully adhering to user instructions, underscoring the need for models with a better understanding of user-defined privacy preferences.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.05391

Country:

North America > United States (1.00)
Asia (0.93)
Europe (0.67)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy

Neural Information Processing SystemsAug-14-2025, 16:07:13 GMT

Despite the extra interpretability and tighter bounds under composition GDP provides, many widely used mechanisms (e.g., the Laplace mechanism)

algorithm, gdp, privacy profile, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > Michigan (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

$(\varepsilon, \delta)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees

Gomez, Juan Felipe, Kulynych, Bogdan, Kaissis, Georgios, Hayes, Jamie, Balle, Borja, Honkela, Antti

arXiv.org Machine LearningMar-13-2025

Differential privacy (DP) (Dwork et al., 2006; Dwork & Roth, 2014) has emerged as the gold standard for privacypreserving machine learning with provable privacy guarantees. The past two decades have seen significant progress in understanding the precise privacy properties of different algorithms as well as the emergence of many new privacy formalisms (Desfontaines & Pejó, 2020). Despite the multitude of formalisms, the gold standard of reporting privacy guarantees has been to use (ε, δ)- DP (Dwork & Roth, 2014) with a fixed and small δ. The parameter δ is commonly suggested to be significantly smaller than 1/N for a dataset of N individuals, e.g., cryptographically small (Vadhan, 2017; Ponomareva et al., 2023), however, exact values vary in the literature, and δ is ultimately an arbitrary parameter that practitioners must choose ad-hoc. This arbitrariness leads to downstream problems, the most important of which is that the privacy budget ε is incomparable across algorithms (Kaissis et al., 2024). Additionally, (ε, δ)-DP with single δ is a poor representation of actual privacy guarantees of most practical machine learning algorithms, which leads to severe overestimation of risk when converting it to interpretable bounds on success rates of attacks aiming to infer private information in the training data (Kulynych et al., 2024), as illustrated in Figure 1. In this paper, we make the empirical observation that various practical deployments of DP machine learning algorithms, when analysed with modern numerical algorithms known as accountants (Koskela & Honkela, 2021; Gopi et al., 2021; Alghamdi et al., 2023; Doroshenko et al., 2022), are almost exactly characterized by a notion of privacy known as Gaussian DP (GDP) (Dong et al., 2022). In particular, we observe this behavior for DP largescale image classification (De et al., 2022), and the TopDown algorithm for the U.S. Decennial Census (Abowd et al., 2022). This observation is also consistent with the fact that the privacy of the widely used Gaussian mechanism (Dwork & Roth, 2014) is perfectly captured by GDP, and according to the Central Limit Theorem of DP (Dong et al., 2022), the privacy guarantees of a composed algorithm, i.e., one that consists of many applications of simpler building-block DP algorithms, approach those of the Gaussian mechanism.

algorithm, mechanism, trade-off curve, (16 more...)

arXiv.org Machine Learning

2503.10945

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report (0.50)
Overview (0.48)
Workflow (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Privacy amplification by random allocation

Feldman, Vitaly, Shenfeld, Moshe

arXiv.org Artificial IntelligenceFeb-12-2025

We consider the privacy guarantees of an algorithm in which a user's data is used in $k$ steps randomly and uniformly chosen from a sequence (or set) of $t$ differentially private steps. We demonstrate that the privacy guarantees of this sampling scheme can be upper bound by the privacy guarantees of the well-studied independent (or Poisson) subsampling in which each step uses the user's data with probability $(1+ o(1))k/t $. Further, we provide two additional analysis techniques that lead to numerical improvements in some parameter regimes. The case of $k=1$ has been previously studied in the context of DP-SGD in Balle et al. (2020) and very recently in Chua et al. (2024). Privacy analysis of Balle et al. (2020) relies on privacy amplification by shuffling which leads to overly conservative bounds. Privacy analysis of Chua et al. (2024a) relies on Monte Carlo simulations that are computationally prohibitive in many practical scenarios and have additional inherent limitations.

allocation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.08202

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.88)

Add feedback

Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting

Schuchardt, Jan, Dalirrooyfard, Mina, Guzelkabaagac, Jed, Schneider, Anderson, Nevmyvaka, Yuriy, Günnemann, Stephan

arXiv.org Machine LearningFeb-4-2025

Many forms of sensitive data, such as web traffic, mobility data, or hospital occupancy, are inherently sequential. The standard method for training machine learning models while ensuring privacy for units of sensitive information, such as individual hospital visits, is differentially private stochastic gradient descent (DP-SGD). However, we observe in this work that the formal guarantees of DP-SGD are incompatible with timeseries-specific tasks like forecasting, since they rely on the privacy amplification attained by training on small, unstructured batches sampled from an unstructured dataset. In contrast, batches for forecasting are generated by (1) sampling sequentially structured time series from a dataset, (2) sampling contiguous subsequences from these series, and (3) partitioning them into context and ground-truth forecast windows. We theoretically analyze the privacy amplification attained by this structured subsampling to enable the training of forecasting models with sound and tight event- and user-level privacy guarantees. Towards more private models, we additionally prove how data augmentation amplifies privacy in self-supervised training of sequence models. Our empirical evaluation demonstrates that amplification by structured subsampling enables the training of forecasting models with strong formal privacy guarantees.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2502.0241

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > California (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Laplace Transform Interpretation of Differential Privacy

Chourasia, Rishav, Javaid, Uzair, Sikdar, Biplap

arXiv.org Artificial IntelligenceNov-13-2024

Differential privacy (DP) [13] has become a widely adopted standard for quantifying privacy of algorithms that process statistical data. In simple terms, differential privacy bounds the influence a single data-point may have on the outcome probabilities. Being a statistical property, the design of differentially private algorithms involves a pen-and-paper analysis of any randomness internal to the processing that obscures the influence a data-point might have on its output. A clear understanding of the nature of differential privacy notions is therefore tantamount to study and design of privacy-preserving algorithms. Throughout its exploration, various functional interpretations of the concept of differential privacy have emerged over the years. These include the privacy-profile curve δ(ϵ) [5] that traces the (ϵ, δ)-DP point guarantees, the f-DP [11] view of worst-case trade-off curve between type I and type II errors for hypothesis testing membership [19, 6], the Rényi DP [23] function of order q that admits a natural analytical composition [1, 23], the view of the privacy loss distribution (PLD) [29] that allows for approximate numerical composition [20, 18], and the recent characteristic function formulation of the dominating privacy loss random variables Zhu et al. [32]. Each of these formalisms have their own properties and use-cases, and none of them seem to be superior in all aspects. Regardless of their differences, they all have some shared difficulties--certain types of manipulations on them are harder to perform in the time-domain, but considerably simpler to do in the frequency-domain. For instance, Koskela et al. [20] noted that composing PLDs of two mechanisms involve convolving their probability densities, which can be numerically approximated efficiently

differential privacy, laplace transform, privacy profile, (13 more...)

arXiv.org Artificial Intelligence

2411.09142

Country:

North America > United States > Arkansas (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

The 2020 United States Decennial Census Is More Private Than You (Might) Think

Su, Buxin, Su, Weijie J., Wang, Chendi

arXiv.org Machine LearningOct-11-2024

The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could sharper privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the nominal privacy budgets been fully utilized? In this paper, we affirmatively address this open problem by demonstrating that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels, from the national level down to the block level. This finding is made possible through our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across various geographical levels. Our analysis indicates that the Census Bureau introduced unnecessarily high levels of injected noise to achieve the claimed privacy guarantee for the 2020 U.S. Census. Consequently, our results enable the Bureau to reduce noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level, thereby enhancing the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.

census bureau, geographical level, us census bureau, (12 more...)

arXiv.org Machine Learning

2410.09296

Country:

North America > United States > Pennsylvania (0.04)
Europe > Italy > Sicily > Palermo (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.67)

Add feedback