Measures of Information Reflect Memorization Patterns

Neural Information Processing Systems 

Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize--and subsequently show--that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.