Acharya, Jayadev
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
Di, Jimmy Z., Douglas, Jack, Acharya, Jayadev, Kamath, Gautam, Sekhari, Ayush
We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.
Unified lower bounds for interactive high-dimensional estimation under information constraints
Acharya, Jayadev, Canonne, Clรฉment L., Sun, Ziteng, Tyagi, Himanshu
We consider the problem of parameter estimation under local information constraints, where the estimation algorithm has access to only limited information about each sample. These constraints can be of various types, including communication constraints, where each sample must be described using a few (e.g., constant number of) bits; (local) privacy constraints, where each sample is obtained from a different user and the users seek to reveal as little as possible about their specific data; as well as many others, e.g., noisy communication channels, or limited types of data access such as linear measurements. Such problems have received a lot of attention in recent years, motivated by applications such as data analytics in distributed systems and federated learning. Our main focus is on information-theoretic lower bounds for the minimax error rates (or, equivalently, the sample complexity) of these problems. Several recent works have provided different bounds that apply to specific constraints or work for specific parametric estimation problems, sometimes without allowing for interactive protocols. Indeed, handling interactive protocols is technically challenging, and several results in prior work exhibit flaws in their analysis. In particular, even the most basic Gaussian mean estimation problem using interactive communication remains, quite surprisingly, open. We present general, "plug-and-play" lower bounds for parametric estimation under information constraints that can be used for any local information constraint and allows for interactive protocols. Our abstract bound requires very simple (and natural) assumptions to hold for the underlying parametric family; in particular, we do not require technical "regularity" conditions that are common in asymptotic statistics.
Discrete Distribution Estimation under User-level Local Differential Privacy
Acharya, Jayadev, Liu, Yuhan, Sun, Ziteng
We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level $\varepsilon$-LDP, each user has $m\ge1$ samples and the privacy of all $m$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $m$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $m$ samples per user is equivalent to having $m$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $m$ and the privacy parameter $\varepsilon$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings.
Robust Estimation for Random Graphs
Acharya, Jayadev, Jain, Ayush, Kamath, Gautam, Suresh, Ananda Theertha, Zhang, Huanyu
Finding underlying patterns and structure in data is a central task in machine learning and statistics. Typically, such structures are induced by modelling assumptions on the data generating procedure. While they offer mathematical convenience, real data generally does not match with these idealized models, for reasons ranging from model misspecification to adversarial data poisoning. Thus for learning algorithms to be effective in the wild, we require methods that are robust to deviations from the assumed model. With this motivation, we initiate the study of robust estimation for random graph models. Specifically, we will be concerned with the Erdลs-Rรฉnyi (ER) random graph model [Gil59, ER59].
Remember What You Want to Forget: Algorithms for Machine Unlearning
Sekhari, Ayush, Acharya, Jayadev, Kamath, Gautam, Suresh, Ananda Theertha
We study the problem of forgetting datapoints from a learnt model. In this case, the learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a predictor $w$ that performs well on unseen samples from that distribution. However, at some point in the future, any training data point $z \in S$ can request to be unlearned, thus prompting the learner to modify its output predictor while still ensuring the same accuracy guarantees. In our work, we initiate a rigorous study of machine unlearning in the population setting, where the goal is to maintain performance on the unseen test loss. We then provide unlearning algorithms for convex loss functions. For the setting of convex losses, we provide an unlearning algorithm that can delete up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learningv(which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This shows that unlearning is at least polynomially more efficient than learning privately in terms of dependence on $d$ in the deletion capacity.
Differentially Private Assouad, Fano, and Le Cam
Acharya, Jayadev, Sun, Ziteng, Zhang, Huanyu
Le Cam's method, Fano's inequality, and Assouad's lemma are three widely used techniques to prove lower bounds for statistical estimation tasks. We propose their analogues under central differential privacy. Our results are simple, easy to apply and we use them to establish sample complexity bounds in several estimation tasks. We establish the optimal sample complexity of discrete distribution estimation under total variation distance and $\ell_2$ distance. We also provide lower bounds for several other distribution classes, including product distributions and Gaussian mixtures that are tight up to logarithmic factors. The technical component of our paper relates coupling between distributions to the sample complexity of estimation under differential privacy.
Learning and Testing Causal Models with Interventions
Acharya, Jayadev, Bhattacharyya, Arnab, Daskalakis, Constantinos, Kandasamy, Saravanan
We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon 2) samples per intervention, suffice to efficiently distinguish whether X M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.
Optimal multiclass overfitting by sequence reconstruction from Hamming queries
Acharya, Jayadev, Suresh, Ananda Theertha
A primary concern of excessive reuse of test datasets in machine learning is that it can lead to overfitting. Multiclass classification was recently shown to be more resistant to overfitting than binary classification. In an open problem of COLT 2019, Feldman, Frostig, and Hardt ask to characterize the dependence of the amount of overfitting bias with the number of classes $m$, the number of accuracy queries $k$, and the number of examples in the dataset $n$. We resolve this problem and determine the amount of overfitting possible in multi-class classification. We provide computationally efficient algorithms that achieve overfitting bias of $\tilde{\Theta}(\max\{\sqrt{{k}/{(mn)}}, k/n\})$, matching the known upper bounds.
Distributed Learning with Sublinear Communication
Acharya, Jayadev, De Sa, Christopher, Foster, Dylan J., Sridharan, Karthik
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine. This model has received substantial interest in machine learning due to its scalability and potential for parallel speedup. However, in high-dimensional settings, where the number examples is smaller than the number of features ("dimension"), the speedup afforded by distributed learning may be overshadowed by the cost of communicating a single example. This paper investigates the following question: When is it possible to learn a $d$-dimensional model in the distributed setting with total communication sublinear in $d$? Starting with a negative result, we show that for learning $\ell_1$-bounded or sparse linear models, no algorithm can obtain optimal error until communication is linear in dimension. Our main result is that that by slightly relaxing the standard boundedness assumptions for linear models, we can obtain distributed algorithms that enjoy optimal error with communication logarithmic in dimension. This result is based on a family of algorithms that combine mirror descent with randomized sparsification/quantization of iterates, and extends to the general stochastic convex optimization model.
Learning and Testing Causal Models with Interventions
Acharya, Jayadev, Bhattacharyya, Arnab, Daskalakis, Constantinos, Kandasamy, Saravanan
We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded ``confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon^2) samples per intervention, suffice to efficiently distinguish whether X=M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.