From Condensation to Rank Collapse: ATwo-Stage Analysis of Transformer Training Dynamics

Jun-16-2026, 13:18:22 GMT–Neural Information Processing Systems

Although transformer-based models have shown exceptional empirical performance, the fundamental principles governing their training dynamics are inadequately characterized beyond configuration-specific studies. Inspired by empirical evidence showing improved reasoning capabilities under small initialization scales in language models, we employ the gradient flow analytical framework established in Zhou et al. [2022] to systematically investigate linearized Transformer training dynamics.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-16-2026, 13:18:22 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found