Revisiting the Role of Language Priors in Vision-Language Models