Aliakbarpour, Maryam
Better Private Distribution Testing by Leveraging Unverified Auxiliary Data
Aliakbarpour, Maryam, Burudgunte, Arnav, Cannone, Clément, Rubinfeld, Ronitt
Accurately analyzing data while preserving individual privacy is a fundamental challenge in statistical inference. Since its formulation nearly two decades ago, Differential Privacy (DP) [DMNS06] has emerged as the leading framework for privacy-preserving data analysis, providing strong mathematical privacy guarantees and gaining adoption by major entities such as the U.S. Census Bureau, Amazon [Ama24], Google [EPK14], Microsoft [DKY17], and Apple [Dif17; TVVKFSD17]. Unfortunately, DP guarantees often come at the cost of increased data requirements or computational resources, which has limited the widespread adoption of differential privacy in spite of its theoretical appeal. To address this issue, a recent line of work has investigated whether access to even small amounts of additional public data could help mitigate this loss of performance. Promising results for various tasks have been shown, both experimentally [KST20; LLHR24; BZHZK24; DORKSF24] and theoretically [BKS22; BBCKS23]. The use of additional auxiliary information is very enticing, as such access is available in many real-world applications: for example, hospitals handling sensitive patient data might leverage public datasets, records from different periods or locations, or synthetic data generated by machine learning models to improve analysis. Similarly, medical or socio-econonomic studies focusing on a minority or protected group can leverage statistical data from the overall population. However, integrating public data introduces its own challenges, as it often lacks guarantees regarding its accuracy or relevance to private datasets.
Privacy in Metalearning and Multitask Learning: Modeling and Separations
Aliakbarpour, Maryam, Bairaktari, Konstantina, Smith, Adam, Swanberg, Marika, Ullman, Jonathan
Model personalization allows a set of individuals, each facing a different learning task, to train models that are more accurate for each person than those they could develop individually. For example, consider a set of people, each of whom holds a relatively small dataset of photographs labeled with the names of their loved ones that appear in each picture. Each person would like to build a classifier that labels future pictures with the names of people in the picture, but training such an image classifier would take more data than any individual person has. Even though the tasks they want to carry out are different--their photos have different subjects--those tasks share a lot of common structure. By pooling their data, a large group of people could learn the shared components of a good set of classifiers. Each individual could then train the subject-specific components on their own, requiring only a few examples for each subject. Other applications of personalization include next-word prediction on a mobile keyboard, speech recognition, and recommendation systems. The goals of personalization are captured in a variety of formal frameworks, such as multitask learning and metalearning.
Optimal Algorithms for Augmented Testing of Discrete Distributions
Aliakbarpour, Maryam, Indyk, Piotr, Rubinfeld, Ronitt, Silwal, Sandeep
We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor's quality, measured by its total variation distance from $p$. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimation's accuracy (i.e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i.e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach.
Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy
Aliakbarpour, Maryam, Chaudhuri, Syomantak, Courtade, Thomas A., Fallah, Alireza, Jordan, Michael I.
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties. However, LDP applies uniform protection to all data features, including less sensitive ones, which degrades performance of downstream tasks. To overcome this limitation, we propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification. This more nuanced approach complements LDP by adjusting privacy protection according to the sensitivity of each feature, enabling improved performance of downstream tasks without compromising privacy. We characterize the properties of BCDP and articulate its connections with standard non-Bayesian privacy frameworks. We further apply our BCDP framework to the problems of private mean estimation and ordinary least-squares regression. The BCDP-based approach obtains improved accuracy compared to a purely LDP-based approach, without compromising on privacy.
Metalearning with Very Few Samples Per Task
Aliakbarpour, Maryam, Bairaktari, Konstantina, Brown, Gavin, Smith, Adam, Ullman, Jonathan
Metalearning and multitask learning are two frameworks for solving a group of related learning tasks more efficiently than we could hope to solve each of the individual tasks on their own. In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i.i.d. from a metadistribution and need to output some common information that can be easily specialized to new, previously unseen tasks from the metadistribution. In this work, we consider a binary classification setting where tasks are related by a shared representation, that is, every task $P$ of interest can be solved by a classifier of the form $f_{P} \circ h$ where $h \in H$ is a map from features to some representation space that is shared across tasks, and $f_{P} \in F$ is a task-specific classifier from the representation space to labels. The main question we ask in this work is how much data do we need to metalearn a good representation? Here, the amount of data is measured in terms of both the number of tasks $t$ that we need to see and the number of samples $n$ per task. We focus on the regime where the number of samples per task is extremely small. Our main result shows that, in a distribution-free setting where the feature vectors are in $\mathbb{R}^d$, the representation is a linear map from $\mathbb{R}^d \to \mathbb{R}^k$, and the task-specific classifiers are halfspaces in $\mathbb{R}^k$, we can metalearn a representation with error $\varepsilon$ using just $n = k+2$ samples per task, and $d \cdot (1/\varepsilon)^{O(k)}$ tasks. Learning with so few samples per task is remarkable because metalearning would be impossible with $k+1$ samples per task, and because we cannot even hope to learn an accurate task-specific classifier with just $k+2$ samples per task.
Differentially Private Medians and Interior Points for Non-Pathological Data
Aliakbarpour, Maryam, Silver, Rose, Steinke, Thomas, Ullman, Jonathan
A statistical estimator is an algorithm that takes data drawn from an unknown distribution as input and tries to learn something about that distribution. While the input data is only a conduit for learning about the distribution, many statistical estimators also reveal a lot of information that is specific to the input data, which raises concerns about the privacy of people who contributed their data. In response, we can try to design estimators that are differentially private (DP) [DMNS06], which ensure that no attacker can infer much more about any person in the input data than they could have inferred in a hypothetical world where that person's data had never been collected. Differential privacy is a strong constraint that imposes significant costs even for very simple statistical estimation tasks. In this paper we focus on two such tasks: interior point estimation and median estimation. In the interior point problem, we have a distribution overR, and our goal is simply to output somepoint with inf support() sup support().
Local Differential Privacy Is Equivalent to Contraction of $E_\gamma$-Divergence
Asoodeh, Shahab, Aliakbarpour, Maryam, Calmon, Flavio P.
We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_\gamma$-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergences. When combined with standard estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems.
Testing Tail Weight of a Distribution Via Hazard Rate
Aliakbarpour, Maryam, Biswas, Amartya Shankha, Ravichandran, Kavya, Rubinfeld, Ronitt
For many computational problems, the distribution of data affects the algorithms we choose to solve them. One important attribute of a distribution is the shape of its "tail", that is, the behavior of the distribution as it moves away from the mean. In some distributions, the frequencies of elements far from the mean drop very quickly ("light-tailed"), while in others, the frequencies of large elements drop more slowly ("heavy-tailed"); consequently, in such "heavy-tailed" distributions, more samples are needed to see important elements. For example, Harchol-Balter in [HB99] has shown that the performances of policies for scheduling computing jobs vary dramatically depending on whether the distribution of the job workloads are heavy-tailed or light-tailed. The shape of the distribution has influenced the design and analysis of learning algorithms - three recent examples include the classification algorithm of [WRH17], the generalization bound of [Fel20] and the frequency estimation result of [HIKV18] (where the latter two are for long-tailed and Zipfian distributions respectively, both subclasses of heavy-tailed distributions).
Testing Determinantal Point Processes
Gatmiry, Khashayar, Aliakbarpour, Maryam, Jegelka, Stefanie
Determinantal point processes (DPPs) are a rich class of discrete probability distributions that were first studied in the context of quantum physics [54] and random matrix theory [30]. Initiated by the seminal work of Kulesza and Taskar [46], DPPs have gained a lot of attention in machine learning, due to their ability to naturally capture notions of diversity and repulsion. Moreover, they are easy to define via a similarity (kernel) matrix, and, as opposed to many other probabilistic models, offer tractable exact algorithms for marginalization, conditioning and sampling [5, 42, 46, 51]. Therefore, DPPs have been explored in a wide range of applications, including video summarization [38, 39], image search [2, 45], document and timeline summarization [53], recommendation [69], feature selection in bioinformatics [9], modeling neurons [63], and matrix approximation [22, 23, 50]. A Determinantal Point Process is a distribution over the subsets of a ground set [n] {1,2,...n}, and parameterized by a marginal kernel matrix K R
Testing Mixtures of Discrete Distributions
Aliakbarpour, Maryam, Kumar, Ravi, Rubinfeld, Ronitt
There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size $n$, distinguishing the uniform distribution from distributions that are far from uniform in $\ell_1$-distance uses only $O(\sqrt{n})$ samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is $\epsilon$-close to uniform from the case where the distribution is $(1-\epsilon)$-far from uniform. The latter task requires nearly linear in $n$ samples [Valiant 2008, Valian and Valiant 2011]. In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known a priori. Focusing on the identity and closeness testing problems leads to the following mixture testing question: Given samples of distributions $p, q_1,q_2$, can we test if $p$ is a mixture of $q_1$ and $q_2$? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case.