MultiHoax: A Dataset of Multi-hop False-Premise Questions
Shafiei, Mohammadamin, Saffari, Hamidreza, Moosavi, Nafise Sadat
–arXiv.org Artificial Intelligence
As Large Language Models are increasingly deployed in high-stakes domains, their ability to detect false assumptions and reason critically is crucial for ensuring reliable outputs. False-premise questions (FPQs) serve as an important evaluation method by exposing cases where flawed assumptions lead to incorrect responses. While existing benchmarks focus on single-hop FPQs, real-world reasoning often requires multi-hop inference, where models must verify consistency across multiple reasoning steps rather than relying on surface-level cues. To address this gap, we introduce MultiHoax, a benchmark for evaluating LLMs' ability to handle false premises in complex, multi-step reasoning tasks. Our dataset spans seven countries and ten diverse knowledge categories, using Wikipedia as the primary knowledge source to enable factual reasoning across regions. Experiments reveal that state-of-the-art LLMs struggle to detect false premises across different countries, knowledge categories, and multi-hop reasoning types, highlighting the need for improved false premise detection and more robust multi-hop reasoning capabilities in LLMs.
arXiv.org Artificial Intelligence
Jun-5-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Indonesia (0.04)
- Japan > Honshū
- Chūbu > Aichi Prefecture
- Nagoya (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Chūbu > Aichi Prefecture
- Middle East
- Iran > Tehran Province
- Tehran (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Iran > Tehran Province
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- China > Beijing
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Sicily (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- South America
- Brazil (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Asia
- Genre:
- Personal > Honors (1.00)
- Research Report (1.00)
- Industry:
- Education (0.93)
- Leisure & Entertainment > Sports
- Olympic Games (1.00)
- Soccer (0.93)
- Media > News (0.68)
- Technology: