CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset
Libovický, Jindřich, Helcl, Jindřich, Manea, Andrei, Vico, Gianluca
–arXiv.org Artificial Intelligence
We introduce CUS-QA, a benchmark for open-ended regional question answering that encompasses both textual and visual modalities. We also provide strong baselines using state-of-the-art large language models (LLMs). Our dataset consists of manually curated questions and answers grounded in Wikipedia, created by native speakers from Czechia, Slovakia, and Ukraine, with accompanying English translations. It includes both purely textual questions and those requiring visual understanding. We evaluate state-of-the-art LLMs through prompting and complement this with human judgments of answer correctness. Using these human evaluations, we analyze the reliability of existing automatic evaluation metrics. Our baseline results show that even the best open-weight LLMs achieve only around 50% accuracy on textual questions and below 30% on visual questions. LLM-based evaluation metrics show strong correlation with human judgment, while traditional string-overlap metrics perform surprisingly well due to the prevalence of named entities in answers.
arXiv.org Artificial Intelligence
Aug-22-2025
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- Middle East
- Israel (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Vietnam > Hanoi
- Hanoi (0.04)
- Middle East
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Czechia
- Moravian-Silesian Region > Ostrava (0.04)
- Prague (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ukraine
- Kharkiv Oblast > Kharkiv (0.04)
- Kyiv Oblast > Kyiv (0.04)
- Slovakia > Košice
- Košice (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Austria > Vienna (0.14)
- Ireland > Leinster
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Tennessee > Davidson County
- Nashville (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- Canada > Ontario
- South America > Chile
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Leisure & Entertainment (0.92)
- Media (0.93)
- Technology: