Multimodal Autoregressive Pre-training of Large Vision Encoders