Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries
Pranav, Tushar, Pandey, Eshan, Bala, Austria Lyka Diane, Chadha, Aman, Atmosukarto, Indriyati, Lock, Donny Soh Cheng
–arXiv.org Artificial Intelligence
Vision-Language Models (VLMs) excel in multimodal tasks but often exhibit Western-centric biases, limiting their effectiveness in culturally diverse regions like Southeast Asia (SEA). To address this, we introduce RICE-VL, a novel benchmark evaluating VLM cultural understanding across 11 ASEAN countries. RICE-VL includes over 28,000 human-curated Visual Question Answering (VQA) samples -- covering True or False, Fill-in-the-Blank, and open-ended formats -- and 1,000 image-bounding box pairs for Visual Grounding, annotated by culturally informed experts across 14 sub-ground categories. We propose SEA-LAVE, an extension of the LAVE metric, assessing textual accuracy, cultural alignment, and country identification. Evaluations of six open- and closed-source VLMs reveal significant performance gaps in low-resource countries and abstract cultural domains. The Visual Grounding task tests models' ability to localize culturally significant elements in complex scenes, probing spatial and contextual accuracy. RICE-VL exposes limitations in VLMs' cultural comprehension and highlights the need for inclusive model development to better serve diverse global populations.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Asia
- Brunei (0.05)
- Cambodia (0.05)
- Malaysia (0.05)
- Vietnam (0.05)
- Indonesia (0.05)
- Laos (0.05)
- Philippines (0.05)
- Timor-Leste (0.05)
- Taiwan (0.04)
- Myanmar (0.05)
- Southeast Asia (0.26)
- Thailand (0.05)
- Singapore (0.06)
- Europe
- Austria (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California > Santa Clara County
- Asia
- Genre:
- Research Report (1.00)
- Technology: