NeoQA: Evidence-based Question Answering with Generated News Events
Glockner, Max, Jiang, Xiang, Ribeiro, Leonardo F. R., Gurevych, Iryna, Dreyer, Markus
–arXiv.org Artificial Intelligence
Evaluating Retrieval-Augmented Generation (RAG) in large language models (LLMs) is challenging because benchmarks can quickly become stale. Questions initially requiring retrieval may become answerable from pretraining knowledge as newer models incorporate more recent information during pretraining, making it difficult to distinguish evidence-based reasoning from recall. We introduce NeoQA (News Events for Out-of-training Question Answering), a benchmark designed to address this issue. To construct NeoQA, we generated timelines and knowledge bases of fictional news events and entities along with news articles and Q\&A pairs to prevent LLMs from leveraging pretraining knowledge, ensuring that no prior evidence exists in their training data. We propose our dataset as a new platform for evaluating evidence-based question answering, as it requires LLMs to generate responses exclusively from retrieved evidence and only when sufficient evidence is available. NeoQA enables controlled evaluation across various evidence scenarios, including cases with missing or misleading details. Our findings indicate that LLMs struggle to distinguish subtle mismatches between questions and evidence, and suffer from short-cut reasoning when key information required to answer a question is missing from the evidence, underscoring key limitations in evidence-based reasoning.
arXiv.org Artificial Intelligence
May-12-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan (0.04)
- Middle East
- Iran > Tehran Province
- Tehran (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Iran > Tehran Province
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Germany > Hesse
- Darmstadt Region > Darmstadt (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- United States
- California (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York (0.04)
- North Dakota > Burke County (0.04)
- Washington > King County
- Seattle (0.04)
- Canada > Ontario
- Oceania > Australia
- South America > Chile
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Transportation
- Automobiles & Trucks (0.67)
- Banking & Finance (1.00)
- Education (1.00)
- Government
- Foreign Policy (0.92)
- Immigration & Customs (0.67)
- Military (0.67)
- Regional Government (1.00)
- Media > News (1.00)
- Health & Medicine (1.00)
- Law
- Civil Rights & Constitutional Law (1.00)
- Criminal Law (0.67)
- Environmental Law (0.67)
- Leisure & Entertainment (0.92)
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety (0.67)
- Technology: