ViT -- An Image is worth 16x16 words: Transformers for Image Recognition at scale -- ICLR'21

Open in new window