Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

Open in new window