Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Open in new window