Understanding Softmax Attention Layers:\\ Exact Mean-Field Analysis on a Toy Problem

Open in new window