Vision Transformers provably learn spatial structure

Open in new window