WikiWhy: Answering and Explaining Cause-and-Effect Questions
Ho, Matthew, Sharma, Aditya, Chang, Justin, Saxon, Michael, Levy, Sharon, Lu, Yujie, Wang, William Yang
–arXiv.org Artificial Intelligence
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements.
arXiv.org Artificial Intelligence
Nov-30-2022
- Country:
- Indian Ocean > Arabian Gulf (0.04)
- South America > Chile
- Oceania
- New Zealand (0.04)
- Australia (0.04)
- North America
- Dominican Republic (0.04)
- Canada (0.04)
- United States
- Texas (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Barbara County
- Santa Barbara (0.04)
- Europe
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Russia (0.04)
- Japan (0.04)
- Middle East
- Republic of Türkiye (0.04)
- Israel (0.04)
- Saudi Arabia > Arabian Gulf (0.04)
- Genre:
- Research Report (0.65)
- Industry:
- Energy (0.93)
- Education (0.68)
- Materials > Metals & Mining (0.46)
- Technology: