On the Evaluation of Common-Sense Reasoning in Natural Language Understanding

Trichelair, Paul, Emami, Ali, Cheung, Jackie Chi Kit, Trischler, Adam, Suleman, Kaheer, Diaz, Fernando

Nov-5-2018–arXiv.org Artificial Intelligence

The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significantly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.

artificial intelligence, commonsense reasoning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Nov-5-2018

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Quebec > Montreal (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Commonsense Reasoning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found