On the Convergence of Encoder-only Shallow Transformers