Commonsense Reasoning
An Analysis of Dataset Overlap on Winograd-Style Tasks
Emami, Ali, Trischler, Adam, Suleman, Kaheer, Cheung, Jackie Chi Kit
The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model performance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlap between these training corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the corpora on which state-of-the-art models are (pre)trained, and that a significant drop in classification accuracy occurs when we evaluate models on instances with minimal overlap. Based on these results, we develop the KnowRef-60K dataset, which consists of over 60k pronoun disambiguation problems scraped from web data. KnowRef-60K is the largest corpus to date for WSC-style common-sense reasoning and exhibits a significantly lower proportion of overlaps with current pretraining corpora.
I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Lim, Jungwoo, Oh, Dongsuk, Jang, Yoonna, Yang, Kisu, Lim, Heuiseok
CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.
Advanced Semantics for Commonsense Knowledge Extraction
Nguyen, Tuan-Phong, Razniewski, Simon, Weikum, Gerhard
Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.
Seeking Artificial Common Sense
Although artificial intelligence (AI) has made great strides in recent years, it still struggles to provide useful guidance about unstructured events in the physical or social world. In short, computer programs lack common sense. "Think of it as the tens of millions of rules of thumb about how the world works that are almost never explicitly communicated," said Doug Lenat of Cycorp, in Austin, TX. Beyond these implicit rules, though, commonsense systems need to make proper deductions from them and from other, explicit statements, he said. "If you are unable to do logical reasoning, then you don't have common sense."
Differentiable Open-Ended Commonsense Reasoning
Lin, Bill Yuchen, Sun, Haitian, Dhingra, Bhuwan, Zaheer, Manzil, Ren, Xiang, Cohen, William W.
Current commonsense reasoning research mainly focuses on developing models that use commonsense knowledge to answer multiple-choice questions. However, systems designed to answer multiple-choice questions may not be useful in applications that do not provide a small list of possible candidate answers to choose from. As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) -- the task of answering a commonsense question without any pre-defined choices, using as a resource only a corpus of commonsense facts written in natural language. The task is challenging due to a much larger decision space, and because many commonsense questions require multi-hop reasoning. We propose an efficient differentiable model for multi-hop reasoning over knowledge facts, named DrFact. We evaluate our approach on a collection of re-formatted, open-ended versions of popular tests targeting commonsense reasoning, and show that our approach outperforms strong baseline methods by a large margin.
Deriving Commonsense Inference Tasks from Interactive Fictions
Yu, Mo, Guo, Xiaoxiao, Feng, Yufei, Zhu, Xiaodan, Greenspan, Michael, Campbell, Murray
For example, most benchmarks When playing an Interactive Fiction (IF) game, we focus on collocation, association or other relations explore and progress through a fantasy world by observing (e.g., ConceptNet (Speer et al., 2016) relations) between textual descriptions and sending text commands words or concepts (Levesque et al., 2012; to control the protagonist. While in pure Talmor et al., 2019; Mullenbach et al., 2019; Jiang texts, we relate the implicit knowledge of these fantasy et al., 2020). Other examples include temporal commonsense worlds with those in our physical world. For (Zhou et al., 2019), physical interactions example, we explore unvisited regions by planning between action and objects (Bisk et al., 2020), emotions over the mentioned locations (spatial relations); we and behaviors of people under the given situation eat apples to recover health and attach the enemies (Sap et al., 2019b), and cause-effects between with swords, but not vice versa (physical interaction events and states (Sap et al., 2019a; Bhagavatula relations); we retrospect the poor choice of et al., 2019; Huang et al., 2019). Second, the task breaking the lantern when we find the protagonist form makes them more likely commonsense validation, in a dangerous dark wood (cause and effects). Plentiful i.e., validation between a commonsense fact and diverse commonsense knowledge from and a text statement, but neglecting hops among our physical world is encoded in our game playing multiple facts.
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Lei, Jie, Yu, Licheng, Berg, Tamara L., Bansal, Mohit
Given a video with aligned dialogue, people can often infer what is more likely to happen next. Making such predictions requires not only a deep understanding of the rich dynamics underlying the video and dialogue, but also a significant amount of commonsense knowledge. In this work, we explore whether AI models are able to learn to make such multimodal commonsense next-event predictions. To support research in this direction, we collect a new dataset, named Video-and-Language Event Prediction (VLEP), with 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. In order to promote the collection of non-trivial challenging examples, we employ an adversarial human-and-model-in-the-loop data collection procedure. We also present a strong baseline incorporating information from video, dialogue, and commonsense knowledge. Experiments show that each type of information is useful for this challenging task, and that compared to the high human performance on VLEP, our model provides a good starting point but leaves large room for future work. Our dataset and code are available at: https://github.com/jayleicn/VideoLanguageFuturePred
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations
Liu, Tianqiao, Fang, Qian, Ding, Wenbiao, Wu, Zhongqin, Liu, Zitao
There is an increasing interest in the use of automatic mathematical word problem (MWP) generation in educational assessment. Different from standard natural question generation, MWP generation needs to maintain the underlying mathematical operations between quantities and variables, while at the same time ensuring the relevance between the output and the given topic. To address above problem we develop an end-to-end neural model to generate personalized and diverse MWPs in real-world scenarios from commonsense knowledge graph and equations. The proposed model (1) learns both representations from edgeenhanced Levi graphs of symbolic equations and commonsense knowledge; (2) automatically fuses equation and commonsense knowledge information via a selfplanning module when generating the MWPs. Experiments on an educational gold-standard set and a large-scale generated MWP set show that our approach is superior on the MWP generation task, and it outperforms the state-of-the-art models in terms of both automatic evaluation metrics, i.e., BLEU-4, ROUGE-L, Self-BLEU, and human evaluation metrics, i.e, equation relevance, topic relevance, and language coherence. A mathematical word problem (MWP) is a coherent narrative that provides clues to the underlying correct mathematical equations and operations between variables and numerical quantities (Verschaffel et al., 2000; Cetintas et al., 2010; Moyer et al., 1984). Table 1 shows one such problem where students are asked to infer the counts of chickens and rabbits. Mathematical Word Problem Equations Solutions Chickens and rabbits were in the yard. Together they had 27 heads x y 27 x 11 and 86 legs. How many chickens and rabbits were in the yard? In this paper, our objective is to automatically generate well-formed MWPs.
Beyond Language: Learning Commonsense from Images for Reasoning
Cui, Wanqing, Lan, Yanyan, Pang, Liang, Guo, Jiafeng, Cheng, Xueqi
This paper proposes a novel approach to learn commonsense from images, instead of limited raw texts or costly constructed knowledge bases, for the commonsense reasoning problem in NLP. Our motivation comes from the fact that an image is worth a thousand words, where richer scene information could be leveraged to help distill the commonsense knowledge, which is often hidden in languages. Our approach, namely Loire, consists of two stages. In the first stage, a bi-modal sequence-to-sequence approach is utilized to conduct the scene layout generation task, based on a text representation model ViBERT. In this way, the required visual scene knowledge, such as spatial relations, will be encoded in ViBERT by the supervised learning process with some bi-modal data like COCO. Then ViBERT is concatenated with a pre-trained language model to perform the downstream commonsense reasoning tasks. Experimental results on two commonsense reasoning problems, i.e. commonsense question answering and pronoun resolution, demonstrate that Loire outperforms traditional language-based methods. We also give some case studies to show what knowledge is learned from images and explain how the generated scene layout helps the commonsense reasoning process.
Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries
By extending Cyc's ontology and KB approximately 2%, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The system, SRA (for Semantic Research Assistant), dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, the user may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query-answering, queries can be bundled and persist over time. One bundle of 275 queries is rerun quarterly by CCF to produce the procedures and outcomes data it needs to report to STS (Society of Thoracic Surgeons, an external hospital accreditation and ranking body); another bundle covers ACC (American College of Cardiology) reporting.