One of the wonders of machine learning is that it turns any kind of data into mathematical equations. Once you train a machine learning model on training examples--whether it's on images, audio, raw text, or tabular data--what you get is a set of numerical parameters. In most cases, the model no longer needs the training dataset and uses the tuned parameters to map new and unseen examples to categories or value predictions. You can then discard the training data and publish the model on GitHub or run it on your own servers without worrying about storing or distributing sensitive information contained in the training dataset. But a type of attack called "membership inference" makes it possible to detect the data used to train a machine learning model.
Membership inference (MI) attacks exploit a learned model's lack of generalization to infer whether a given sample was in the model's training set. Known MI attacks generally work by casting the attacker's goal as a supervised learning problem, training an attack model from predictions generated by the target model, or by others like it. However, we find that these attacks do not often provide a meaningful basis for confidently inferring training set membership, as the attack models are not well-calibrated. Moreover, these attacks do not significantly outperform a trivial attack that predicts that a point is a member if and only if the model correctly predicts its label. In this work we present well-calibrated MI attacks that allow the attacker to accurately control the minimum confidence with which positive membership inferences are made. Our attacks take advantage of white-box information about the target model and leverage new insights about how overfitting occurs in deep neural networks; namely, we show how a model's idiosyncratic use of features can provide evidence for membership. Experiments on seven real-world datasets show that our attacks support calibration for high-confidence inferences, while outperforming previous MI attacks in terms of accuracy. Finally, we show that our attacks achieve non-trivial advantage on some models with low generalization error, including those trained with small-epsilon-differential privacy; for large-epsilon (epsilon=16, as reported in some industrial settings), the attack performs comparably to unprotected models.
While being deployed in many critical applications as core components, machine learning (ML) models are vulnerable to various security and privacy attacks. One major privacy attack in this domain is membership inference, where an adversary aims to determine whether a target data sample is part of the training set of a target ML model. So far, most of the current membership inference attacks are evaluated against ML models trained from scratch. However, real-world ML models are typically trained following the transfer learning paradigm, where a model owner takes a pretrained model learned from a different dataset, namely teacher model, and trains her own student model by fine-tuning the teacher model with her own data. In this paper, we perform the first systematic evaluation of membership inference attacks against transfer learning models. We adopt the strategy of shadow model training to derive the data for training our membership inference classifier. Extensive experiments on four real-world image datasets show that membership inference can achieve effective performance. For instance, on the CIFAR100 classifier transferred from ResNet20 (pretrained with Caltech101), our membership inference achieves $95\%$ attack AUC. Moreover, we show that membership inference is still effective when the architecture of target model is unknown. Our results shed light on the severity of membership risks stemming from machine learning models in practice.
Recent studies propose membership inference (MI) attacks on deep models. Despite the moderate accuracy of such MI attacks, we show that the way the attack accuracy is reported is often misleading and a simple blind attack which is highly unreliable and inefficient in reality can often represent similar accuracy. We show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly indistinguishable among train and non-train samples. Moreover, in contrast with general intuition that deeper models have a capacity to memorize training samples, and, hence, they are more vulnerable to membership inference, we find no evidence to support that and in some cases deeper models are often harder to launch membership inference attack on. Furthermore, despite the common belief, we show that overfitting does not necessarily lead to higher degree of membership leakage. We conduct experiments on MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, ResNet, DenseNet, InceptionV3, and Xception.
Abstract--Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We perform a comprehensive analysis of white-box privacy inference attacks on deep learning models. We measure the privacy leakage by leveraging the final model parameters as well as the parameter updates during the training and finetuning processes.We design the attacks in the stand-alone and federated settings, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to measure their training data membership leakage. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, widely used to train deep neural networks. We show that even well-generalized models are significantly susceptible towhite-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants of a federated learning setting can run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies. I. INTRODUCTION Machine learning models based on deep neural networks have been shown to have significantly high generalization accuracies for various learning tasks, from image and speech recognition to generating realistic-looking data. This success has led to many applications and services that use deep learning algorithms on large-dimension (and potentially sensitive) userdata, including user speeches, images, and medical, financial, social, and location data points. The crucial question we ask in this paper is the following: How much is the privacy risks of deep learning for individuals whose data is used as part of the training set? In other words, how much is the information leakage of deep learning algorithms about their individual training data samples? We define privacy-sensitive leakage of a model about a set of target trained data record(s) as the information that an adversary can infer about the data records, that could not have been inferred from similar models not using the target data records. This distinguishes between the information that we can learn from the model about the population and the information that it leaks about particular samples from the population that were in its training set. The former indicates utility gain, and the later reflects privacy loss. We design inference attacks to quantify such privacy leakage. Theinference attacks fall into two fundamental and related categories: tracing (a.k.a.