Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

Open in new window