From Condensation to Rank Collapse: ATwo-Stage Analysis of Transformer Training Dynamics

Open in new window