TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models

Jun-15-2026, 10:16:45 GMT–Neural Information Processing Systems

Vision-language models (VLMs) often struggle with culturally specific content -- a challenge largely overlooked by existing benchmarks that focus on dominant languages and globalized datasets. We introduce TAIWANVQA, a VQA benchmark designed for Taiwanese culture to evaluate recognition and reasoning in regional contexts. TAIWANVQA contains 2,736 images and 5,472 manually curated questions covering topics such as traditional foods, public signs, festivals, and landmarks. The official benchmark set includes 1,000 images and 2,000 questions for systematic assessment, with the remainder of the data used as training material. Evaluations on state-of-the-art VLMs reveal strong visual recognition but notable weaknesses in cultural reasoning.

benchmark, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Jun-15-2026, 10:16:45 GMT

Conferences PDF

Add feedback

Country:
- Asia > Taiwan (0.30)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (1.00)
- Media (0.92)
- Law (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found