XCiT: Cross-Covariance Image Transformers Appendix A Preliminary study on Vision Transformers (ViT)