Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Yang, Yanlai, Jones, Matt, Mozer, Michael C., Ren, Mengye

Mar-14-2024–arXiv.org Artificial Intelligence

We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-14-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Colorado > Boulder County
    - Boulder (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.46)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found