ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

Open in new window