CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark David Romero
–Neural Information Processing Systems
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric.
Neural Information Processing Systems
May-28-2025, 13:33:45 GMT
- Country:
- Africa (0.93)
- Asia > Indonesia
- Europe (1.00)
- North America (1.00)
- South America (1.00)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology (0.46)
- Technology: