On the Convergence of Encoder-only Shallow Transformers

Open in new window