Image Captioners Are Scalable Vision Learners Too

Open in new window