Attention with Trained Embeddings Provably Selects Important Tokens

Open in new window