AITopics

2503.17252

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

arXiv.org Artificial IntelligenceMar-14-2025

PREAMBLE: Private and Efficient Aggregation of Block Sparse Vectors and Applications

Asi, Hilal, Feldman, Vitaly, Keller, Hannah, Rothblum, Guy N., Talwar, Kunal

We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such as Prio. These systems are typically used to aggregate vectors such as gradients in private federated learning, where the aggregate itself is protected via noise addition to ensure differential privacy. Existing approaches require communication scaling with the dimensionality, and thus limit the dimensionality of vectors one can efficiently process in this setup. We propose PREAMBLE: Private Efficient Aggregation Mechanism for BLock-sparse Euclidean Vectors. PREAMBLE is a novel extension of distributed point functions that enables communication- and computation-efficient aggregation of block-sparse vectors, which are sparse vectors where the non-zero entries occur in a small number of clusters of consecutive coordinates. We then show that PREAMBLE can be combined with random sampling and privacy amplification by sampling results, to allow asymptotically optimal privacy-utility trade-offs for vector aggregation, at a fraction of the communication cost. When coupled with recent advances in numerical privacy accounting, our approach incurs a negligible overhead in noise variance, compared to the Gaussian mechanism used with Prio.

artificial intelligence, machine learning, vector, (17 more...)

2503.11897

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningOct-9-2024

Adaptive Batch Size for Privately Finding Second-Order Stationary Points

Liu, Daogao, Talwar, Kunal

There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it remains unclear whether privately finding an SOSP is more challenging than finding an FOSP. Specifically, Ganesh et al. (2023) demonstrated that an $\alpha$-SOSP can be found with $\alpha=O(\frac{1}{n^{1/3}}+(\frac{\sqrt{d}}{n\epsilon})^{3/7})$, where $n$ is the dataset size, $d$ is the dimension, and $\epsilon$ is the differential privacy parameter. Building on the SpiderBoost algorithm framework, we propose a new approach that uses adaptive batch sizes and incorporates the binary tree mechanism. Our method improves the results for privately finding an SOSP, achieving $\alpha=O(\frac{1}{n^{1/3}}+(\frac{\sqrt{d}}{n\epsilon})^{1/2})$. This improved bound matches the state-of-the-art for finding an FOSP, suggesting that privately finding an SOSP may be achievable at no additional cost.

artificial intelligence, machine learning, optimization, (15 more...)

2410.07502

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningOct-8-2024

Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization

Kornowski, Guy, Liu, Daogao, Talwar, Kunal

R may be neither smooth nor convex. Such problems are ubiquitous throughout machine learning, where losses given by deep-learning based models give rise to highly nonsmooth nonconvex (NSNC) landscapes. Due to its fundamental importance in modern machine learning, the field of nonconvex optimization has received substantial attention in recent years. Moving away from the classical regime of convex optimization, many works aimed at understanding the complexity of producing approximatestationary points, namely with small gradient norm [Ghadimi and Lan, 2013, Fang et al., 2018, Carmon et al., 2020, Arjevani et al., 2023]. As it turns out, without smoothness, it is impossible to directly minimize the gradient norm without suffering from an exponential-dimension dependent runtime in the worst case [Kornowski and Shamir, 2022]. Nonetheless, a nuanced notion coined as Goldstein-stationarity [Goldstein, 1977], has been shown in recent years to enable favorable guarantees.

algorithm, artificial intelligence, machine learning, (19 more...)

2410.0588

Country: Asia > Middle East (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Machine LearningJun-27-2024

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Feldman, Vitaly, McMillan, Audra, Sivakumar, Satchit, Talwar, Kunal

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances. For distributions $P$ over $\mathbb{R}$, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either $P$ or $Q_P$ for some distribution $Q_P$ whose probability density function (pdf) is within a factor of 2 of the pdf of $P$. For distributions over $\mathbb{R}^2$, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for $\mathbb{R}^2$ extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

algorithm, artificial intelligence, machine learning, (18 more...)

2406.19566

Country:

Europe (1.00)
Asia (0.92)
North America > United States > New York > New York County > New York City (0.14)
(3 more...)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)

arXiv.org Machine LearningJun-5-2024

Private Online Learning via Lazy Algorithms

Asi, Hilal, Koren, Tomer, Liu, Daogao, Talwar, Kunal

We study the problem of private online learning, specifically, online prediction from experts (OPE) and online convex optimization (OCO). We propose a new transformation that transforms lazy online learning algorithms into private algorithms. We apply our transformation for differentially private OPE and OCO using existing lazy algorithms for these problems. Our final algorithms obtain regret, which significantly improves the regret in the high privacy regime $\varepsilon \ll 1$, obtaining $\sqrt{T \log d} + T^{1/3} \log(d)/\varepsilon^{2/3}$ for DP-OPE and $\sqrt{T} + T^{1/3} \sqrt{d}/\varepsilon^{2/3}$ for DP-OCO. We also complement our results with a lower bound for DP-OPE, showing that these rates are optimal for a natural family of low-switching private algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

2406.0362

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.82)

arXiv.org Artificial IntelligenceApr-25-2024

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Asi, Hilal, Feldman, Vitaly, Nelson, Jelani, Nguyen, Huy L., Talwar, Kunal, Zhou, Samson

We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $\Omega(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $\Omega(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = \Theta(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.

artificial intelligence, machine learning, protocol, (19 more...)

2404.10201

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningOct-30-2023

Stronger Privacy Amplification by Shuffling for R\'enyi and Approximate Differential Privacy

Feldman, Vitaly, McMillan, Audra, Talwar, Kunal

The shuffle model of differential privacy has gained significant interest as an intermediate trust model between the standard local and central models [EFMRTT19; CSUZZ19]. A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17]. In this work, we improve the state of the art privacy amplification by shuffling results both theoretically and numerically. Our first contribution is the first asymptotically optimal analysis of the R\'enyi differential privacy parameters for the shuffled outputs of LDP randomizers. Our second contribution is a new analysis of privacy amplification by shuffling. This analysis improves on the techniques of [FMT20] and leads to tighter numerical bounds in all parameter settings.

artificial intelligence, local randomizer, machine learning, (12 more...)

2208.04591

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

arXiv.org Machine LearningSep-29-2023

Federated Learning with Differential Privacy for End-to-End Speech Recognition

Pelikan, Martin, Azam, Sheikh Shams, Feldman, Vitaly, Silovsky, Jan "Honza", Talwar, Kunal, Likhomanenko, Tatiana

While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$.

artificial intelligence, machine learning, natural language, (18 more...)

2310.00098

Country: North America > United States (0.45)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

arXiv.org Artificial IntelligenceJul-28-2023

Mean Estimation with User-level Privacy under Data Heterogeneity

Cummings, Rachel, Feldman, Vitaly, McMillan, Audra, Talwar, Kunal

A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.

artificial intelligence, data mining, machine learning, (18 more...)

2307.15835

Country:

North America > United States (0.46)
North America > Canada (0.27)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)