FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
Zhu, Andrew, Hwang, Alyssa, Dugan, Liam, Callison-Burch, Chris
–arXiv.org Artificial Intelligence
One type of question that is commonly found in day-to-day scenarios is ``fan-out'' questions, complex multi-hop, multi-document reasoning questions that require finding information about a large number of entities. However, there exist few resources to evaluate this type of question-answering capability among large language models. To evaluate complex reasoning in LLMs more fully, we present FanOutQA, a high-quality dataset of fan-out question-answer pairs and human-annotated decompositions with English Wikipedia as the knowledge base. We formulate three benchmark settings across our dataset and benchmark 7 LLMs, including GPT-4, LLaMA 2, Claude-2.1, and Mixtral-8x7B, finding that contemporary models still have room to improve reasoning over inter-document dependencies in a long context. We provide our dataset and open-source tools to run models to encourage evaluation at https://fanoutqa.com
arXiv.org Artificial Intelligence
Jun-6-2024
- Country:
- Africa > Togo (0.04)
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Middle East > UAE
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- California
- Alameda County > Oakland (0.04)
- Los Angeles County > Los Angeles (0.05)
- Pennsylvania (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Rhode Island > Providence County
- Smithfield (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.14)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Arizona > Maricopa County
- Phoenix (0.04)
- Ohio > Cuyahoga County
- Cleveland (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- California
- South America > Colombia (0.04)
- Genre:
- Research Report (0.82)
- Technology: