Why do LLaVA Vision-Language Models Reply to Images in English?