Inducing Spatial Locality in Vision Transformers through the Training Protocol