Memorizing Long-tail Data Can Help Generalization Through Composition
Zhou, Mo, Ma, Haoyang, Ge, Rong
The relationship between memorization and generalization has always been an intriguing topic in deep learning. It has long been known that neural networks used for supervised learning can memorize noisy or even random labels (Zhang et al., 2017), and recent large language models can still memorize long text despite not being overparametrized (Carlini et al., 2019, 2022). Conventional wisdom from statistical learning theory suggests that memorization might be detrimental to generalization performance, yet neural networks often generalize well with memorization, sometimes even better compared to networks that do not memorize. Trying to understand this relationship from a theoretical perspective has led to the idea of implicit regularization and benign overfitting (Belkin et al., 2019; Bartlett et al., 2020; Belkin et al., 2020; Hastie et al., 2022; Gunasekar et al., 2017), where the training process and architecture choices of neural networks prefer certain solutions that have good generalization despite memorizing the training data. Another interesting line of work (Feldman, 2020; Feldman and Zhang, 2020) demonstrated that memorization can actively help generalization by capturing long-tail behaviors in the training data.
Oct-22-2025
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (0.67)
- Technology: