Evaluating and testing unintended memorization in neural networks

Robohub

Defining memorization rigorously requires thought. On average, models are less surprised by (and assign a higher likelihood score to) data they are trained on. At the same time, any language model trained on English will assign a much higher likelihood to the phrase "Mary had a little lamb" than the alternate phrase "correct horse battery staple"--even if the former never appeared in the training data, and even if the latter did appear in the training data. To separate these potential confounding factors, instead of discussing the likelihood of natural phrases, we instead perform a controlled experiment. Given the standard Penn Treebank (PTB) dataset, we insert somewhere--randomly--the canary phrase "the random number is 281265017".



Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data

#artificialintelligence

Today, we're excited to announce TensorFlow Privacy (GitHub), an open source library that makes it easier not only for developers to train machine-learning models with privacy, but also for researchers to advance the state of the art in machine learning with strong privacy guarantees. Modern machine learning is increasingly applied to create amazing new technologies and user experiences, many of which involve training machines to learn responsibly from sensitive data, such as personal photos or email. Ideally, the parameters of trained machine-learning models should encode general patterns rather than facts about specific training examples. To ensure this, and to give strong privacy guarantees when the training data is sensitive, it is possible to use techniques based on the theory of differential privacy. In particular, when training on users' data, those techniques offer strong mathematical guarantees that models do not learn or remember the details about any specific user.


A Closer Look at Memorization in Deep Networks

arXiv.org Machine Learning

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.


The Natural Auditor: How To Tell If Someone Used Your Words To Train Their Model

arXiv.org Machine Learning

To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we propose a new \emph{model auditing} technique that enables users to check if their data was used to train a machine learning model. We focus on auditing deep-learning models that generate natural-language text, including word prediction and dialog generation. These models are at the core of many popular online services. Furthermore, they are often trained on very sensitive personal data, such as users' messages, searches, chats, and comments. We design and evaluate an effective black-box auditing method that can detect, with very few queries to a model, if a particular user's texts were used to train it (among thousands of other users). In contrast to prior work on membership inference against ML models, we do not assume that the model produces numeric confidence values. We empirically demonstrate that we can successfully audit models that are well-generalized and not overfitted to the training data. We also analyze how text-generation models memorize word sequences and explain why this memorization makes them amenable to auditing.