Memorizing Long-tail Data Can Help Generalization Through Composition

Oct-22-2025–arXiv.org Machine Learning

The relationship between memorization and generalization has always been an intriguing topic in deep learning. It has long been known that neural networks used for supervised learning can memorize noisy or even random labels (Zhang et al., 2017), and recent large language models can still memorize long text despite not being overparametrized (Carlini et al., 2019, 2022). Conventional wisdom from statistical learning theory suggests that memorization might be detrimental to generalization performance, yet neural networks often generalize well with memorization, sometimes even better compared to networks that do not memorize. Trying to understand this relationship from a theoretical perspective has led to the idea of implicit regularization and benign overfitting (Belkin et al., 2019; Bartlett et al., 2020; Belkin et al., 2020; Hastie et al., 2022; Gunasekar et al., 2017), where the training process and architecture choices of neural networks prefer certain solutions that have good generalization despite memorizing the training data. Another interesting line of work (Feldman, 2020; Feldman and Zhang, 2020) demonstrated that memorization can actively help generalization by capturing long-tail behaviors in the training data.

artificial intelligence, ln 2, machine learning, (16 more...)

arXiv.org Machine Learning

Oct-22-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found