Why do LLaVA Vision-Language Models Reply to Images in English?

Open in new window