A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity