Hidden Dynamics of Massive Activations in Transformer Training

Open in new window