Image Captioners Are Scalable Vision Learners Too Michael Tschannen, Andreas Steiner Xiaohua Zhai Neil Houlsby Lucas Beyer