Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Open in new window