CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

May-26-2025, 17:02:34 GMT–Neural Information Processing Systems

Visual Question Answering (VQA) is an important task in multimodal AI, which requires models to understand and reason on knowledge present in visual and textual data. However, most of the current VQA datasets and models are primarily focused on English and a few major world languages, with images that are Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, some datasets extend the text to other languages, either via translation or some other approaches, but usually keep the same images, resulting in narrow cultural representation. To address these limitations, we create CVQA, a new Culturally-diverse Multilingual Visual Question Answering benchmark dataset, designed to cover a rich set of languages and regions, where we engage native speakers and cultural experts in the data collection process.

artificial intelligence, cvqa, question answering, (4 more...)

Neural Information Processing Systems

May-26-2025, 17:02:34 GMT

Conferences Web Page

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
- Asia > Middle East
  - Israel (0.06)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)