Goto

Collaborating Authors

 Christopher K.I. Williams


On Memorization in Probabilistic Deep Generative Models

Neural Information Processing Systems

While experimenting with the proposed memorization score on CIFAR-10 [47], we noticed that the images of automobiles shown in Figure 6 are present in the training set multiple times (with slight variation). These works are recently proposed probabilistic generative models that achieve Figure 6: Examples of images impressive performance on sample quality metrics such as the inception from the CIFAR-10 training score (IS) [35] and the Fréchet inception distance (FID) [36], set that were spotted in illustrations and also achieve high log likelihoods. However, the fact that we of samples from the were able to serendipitously spot images from the training set in model in recent work on generative the generated samples might suggest that some unintended memorization models. We do not know if there are other images in the presented samples that are present in the training data. Of course, spotting near duplicates of training observations is only possible because these models yield realistic samples.



On Memorization in Probabilistic Deep Generative Models

Neural Information Processing Systems

While experimenting with the proposed memorization score on CIFAR-10 [47], we noticed that the images of automobiles shown in Figure 6 are present in the training set multiple times (with slight variation). These works are recently proposed probabilistic generative models that achieve Figure 6: Examples of images impressive performance on sample quality metrics such as the inception from the CIFAR-10 training score (IS) [35] and the Fréchet inception distance (FID) [36], set that were spotted in illustrations and also achieve high log likelihoods. However, the fact that we of samples from the were able to serendipitously spot images from the training set in model in recent work on generative the generated samples might suggest that some unintended memorization models. We do not know if there are other images in the presented samples that are present in the training data. Of course, spotting near duplicates of training observations is only possible because these models yield realistic samples.


On Memorization in Probabilistic Deep Generative Models

Neural Information Processing Systems

Recent advances in deep generative models have led to impressive results in a variety of application domains. Motivated by the possibility that deep learning models might memorize part of the input data, there have been increased efforts to understand how memorization arises. In this work, we extend a recently proposed measure of memorization for supervised learning (Feldman, 2019) to the unsupervised density estimation problem and adapt it to be more computationally efficient. Next, we present a study that demonstrates how memorization can occur in probabilistic deep generative models such as variational autoencoders. This reveals that the form of memorization to which these models are susceptible differs fundamentally from mode collapse and overfitting. Furthermore, we show that the proposed memorization score measures a phenomenon that is not captured by commonly-used nearest neighbor tests. Finally, we discuss several strategies that can be used to limit memorization in practice. Our work thus provides a framework for understanding problematic memorization in probabilistic generative models.