Goto

Collaborating Authors

 Dziedzic, Adam


Have it your way: Individualized Privacy Assignment for DP-SGD

arXiv.org Artificial Intelligence

This budget represents a maximal privacy violation that any user is willing to face by contributing their data to the training set. We argue that this approach is limited because different users may have different privacy expectations. Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. To demonstrate their practicality, we introduce a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which supports such individualized budgets. DP-SGD is the canonical approach to training models with differential privacy. We modify its data sampling and gradient noising mechanisms to arrive at our approach, which we call Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees tailored to the preferences of individual users and their data points, we find it empirically improves privacy-utility trade-offs.


Dataset Inference for Self-Supervised Models

arXiv.org Artificial Intelligence

Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.


Private Multi-Winner Voting for Machine Learning

arXiv.org Artificial Intelligence

Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $\tau$, and Powerset voting. Binary voting operates independently per label through composition. $\tau$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.


Increasing the Cost of Model Extraction with Calibrated Proof of Work

arXiv.org Artificial Intelligence

In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen. Model extraction attacks (Tramèr et al., 2016; Jagielski et al., 2020; Zanella-Beguelin et al., 2021) are a threat to the confidentiality of machine learning (ML) models. They are also used as reconnaissance prior to mounting other attacks, for example, if an adversary wishes to disguise some spam message to get it past a target spam filter (Lowd & Meek, 2005), or generate adversarial examples (Biggio et al., 2013; Szegedy et al., 2014) using the extracted model (Papernot et al., 2017b). Furthermore, an adversary can extract a functionally similar model even without access to any real input training data (Krishna et al., 2020; Truong et al., 2021; Miura et al., 2021) while bypassing the long and expensive process of data procuring, cleaning, and preprocessing. This harms the interests of the model owner and infringes on their intellectual property. Defenses against model extraction can be categorized as active, passive, or reactive. Passive defenses try to detect an attack (Juuti et al., 2019) or truncate outputs (Tramèr et al., 2016), but these methods lower the quality of results for legitimate users. The main reactive defenses against model extraction attacks are watermarking (Jia et al., 2020b), dataset inference (Maini et al., 2021), and proof of learning (Jia et al., 2021). However, reactive approaches address model extraction post hoc, i.e., after the attack has been completed. We design a pro-active defense that prevents model stealing before it succeeds. Specifically, we aim to increase the computational cost of model extraction without lowering the quality of model outputs. Our method is based on the concept of proof-of-work (PoW) and its main steps are presented as a block diagram in Figure 1.