Memorization Through the Lens of Curvature of Loss Function Around Samples

Garg, Isha, Ravikumar, Deepak, Roy, Kaushik

arXiv.org Artificial Intelligence 

Deep neural networks are over-parameterized and easily overfit the datasets they train on. In the extreme case, it has been shown that these networks can memorize a training set with fully randomized labels. We propose using the curvature of loss function around each training sample, averaged over training epochs, as a measure of memorization of the sample. We use this metric to study the generalization versus memorization properties of different samples in popular image datasets and show that it captures memorization statistics well, both qualitatively and quantitatively. We first show that the high curvature samples visually correspond to long-tailed, mislabeled, or conflicting samples, those that are most likely to be memorized. This analysis helps us find, to the best of our knowledge, a novel failure mode on the CIFAR100 and ImageNet datasets: that of duplicated images with differing labels. Quantitatively, we corroborate the validity of our scores via two methods. First, we validate our scores against an independent and comprehensively calculated baseline, by showing high cosine similarity with the memorization scores released by Feldman & Zhang (2020). Second, we inject corrupted samples which are memorized by the network, and show that these are learned with high curvature. An added advantage of our method is that it is scalable, as it requires training only a single network as opposed to the thousands trained by the baseline, while capturing the aforementioned failure mode that the baseline fails to identify. Deep learning has been hugely successful in many fields.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found