Image Captioners Are Scalable Vision Learners Too