Commonsense Reasoning
Commonsense Reasoning
Nuance is no longer sponsoring the competition, and the $25,000 prize mentioned below is no longer offered. The challenge lives on in the many research groups, at Microsoft Research, Facebook, and the Allen Institute, among other places, that are currently (as of 2019) working on aspects of the problem. Commonsense Reasoning is keen to promote the Winograd Schema Challenge and Nuance Communications' competition to successfully pass an alternative to the Turing Test. Background: The Turing Test is intended to serve as a test of whether a machine has achieved human-level intelligence. In one of its best-known versions, a person attempts to determine whether he or she is conversing (via text) with a human or a machine.
AI still doesn't have the common sense to understand human language
Until pretty recently, computers were hopeless at producing sentences that actually made sense. But the field of natural-language processing (NLP) has taken huge strides, and machines can now generate convincing passages with the push of a button. These advances have been driven by deep-learning techniques, which pick out statistical patterns in word usage and argument structure from vast troves of text. But a new paper from the Allen Institute of Artificial Intelligence calls attention to something still missing: machines don't really understand what they're writing (or reading). This is a fundamental challenge in the grand pursuit of generalizable AI--but beyond academia, it's relevant for consumers, too. Chatbots and voice assistants built on state-of-the-art natural-language models, for example, have become the interface for many financial institutions, health-care providers, and government agencies.
Joint Reasoning for Multi-Faceted Commonsense Knowledge
Chalier, Yohan, Razniewski, Simon, Weikum, Gerhard
Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de.
Using ConceptNet to Teach Common Sense to an Automated Theorem Prover
Schon, Claudia, Siebert, Sophie, Stolzenburg, Frieder
In recent years, numerous benchmarks for commonsense reasoning have been presented which cover different areas: the Choice of Plausible Alternatives Challenge (COP A) [17] requires causal reasoning in everyday situations, the Winograd Schema Challenge [8] addresses difficult cases of pronoun disambiguation, the TriangleCOP A Challenge [9] focuses on human relationships and emotions, and the Story Cloze Test with the ROCStories Corpora [11] focuses on the ability to determine a plausible ending for a given short story, to name just a few. In our system, we focus on the COP A challenge where each problem consists of a problem description (the premise), a question, and two answer candidates (called alternatives). See Figure 1 for an example. Most approaches tackling these problems are based on machine learning or exploit statistical properties of the natural language input (see e.g.
A Logical Model for Supporting Social Commonsense Knowledge Acquisition
Gu, Zhenzhen, Cao, Cungen, Wang, Ya, Sui, Yuefei
To make machine exhibit human-like abilities in the domains like robotics and conversation, social commonsense knowledge (SCK), i.e., common sense about social contexts and social roles, is absolutely necessarily. Therefor, our ultimate goal is to acquire large-scale SCK to support much more intelligent applications. Before that, we need to know clearly what is SCK and how to represent it, since automatic information processing requires data and knowledge are organized in structured and semantically related ways. For this reason, in this paper, we identify and formalize three basic types of SCK based on first-order theory. Firstly, we identify and formalize the interrelationships, such as having-role and having-social_relation, among social contexts, roles and players from the perspective of considering both contexts and roles as first-order citizens and not generating role instances. Secondly, we provide a four level structure to identify and formalize the intrinsic information, such as events and desires, of social contexts, roles and players, and illustrate the way of harvesting the intrinsic information of social contexts and roles from the exhibition of players in concrete contexts. And thirdly, enlightened by some observations of actual contexts, we further introduce and formalize the embedding of social contexts, and depict the way of excavating the intrinsic information of social contexts and roles from the embedded smaller and simpler contexts. The results of this paper lay the foundation not only for formalizing much more complex SCK but also for acquiring these three basic types of SCK.
That and There: Judging the Intent of Pointing Actions with Robotic Arms
Alikhani, Malihe, Khalid, Baber, Shome, Rahul, Mitash, Chaitanya, Bekris, Kostas, Stone, Matthew
Collaborative robotics requires effective communication between a robot and a human partner. This work proposes a set of interpretive principles for how a robotic arm can use pointing actions to communicate task information to people by extending existing models from the related literature. These principles are evaluated through studies where English-speaking human subjects view animations of simulated robots instructing pick-and-place tasks. The evaluation distinguishes two classes of pointing actions that arise in pick-and- place tasks: referential pointing (identifying objects) and locating pointing (identifying locations). The study indicates that human subjects show greater flexibility in interpreting the intent of referential pointing compared to locating pointing, which needs to be more deliberate. The results also demonstrate the effects of variation in the environment and task context on the interpretation of pointing. Our corpus, experiments and design principles advance models of context, common sense reasoning and communication in embodied communication.
Evaluating Commonsense in Pre-trained Language Models
Zhou, Xuhui, Zhang, Yue, Cui, Leyang, Huang, Dandan
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bidirectional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CA Ts publicly, for future research. Introduction Contextualized representations trained over large-scale text data have given remarkable improvements to a wide range of NLP tasks, including natural language inference (Bowman et al. 2015), question answering (Rajpurkar, Jia, and Liang 2018) and reading comprehension (Lai et al. 2017). Giving new state-of-the-art results that approach or surpass human performance on several benchmark datasets, it is an interesting question what types of knowledge are learned in pre-trained contextualized representations in order to better understand how they benefit the NLP problems above. Intuitively, such knowledge is at least as useful as semantic and syntactic knowledge in natural language inference, reading comprehension and coreference resolution.
PIQA: Reasoning about Physical Commonsense in Natural Language
Bisk, Yonatan, Zellers, Rowan, Bras, Ronan Le, Gao, Jianfeng, Choi, Yejin
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains - such as news articles and encyclopedia entries, where text is plentiful - in more physical domains, text is inherently limited due to reporting bias. Can AI systems learn to reliably answer physical common-sense questions without experiencing the physical world? In this paper, we introduce the task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA. Though humans find the dataset easy (95% accuracy), large pretrained models struggle (77%). We provide analysis about the dimensions of knowledge that existing models lack, which offers significant opportunities for future research.
Top k Memory Candidates in Memory Networks for Common Sense Reasoning
Successful completion of reasoning task requires the agent to have relevant prior knowledge or some given context of the world dynamics. Usually, the information provided to the system for a reasoning task is just the query or some supporting story, which is often not enough for common reasoning tasks. The goal here is that, if the information provided along the question is not sufficient to correctly answer the question, the model should choose k most relevant documents that can aid its inference process. In this work, the model dynamically selects top k most relevant memory candidates that can be used to successfully solve reasoning tasks. Experiments were conducted on a subset of Winograd Schema Challenge (WSC) problems to show that the proposed model has the potential for commonsense reasoning. The WSC is a test of machine intelligence, designed to be an improvement on the Turing test.
CommonGen: A Constrained Text Generation Dataset Towards Generative Commonsense Reasoning
Lin, Bill Yuchen, Shen, Ming, Xing, Yu, Zhou, Pei, Ren, Xiang
Rational humans can generate sentences that cover a certain set of concepts while describing natural and common scenes. For example, given {apple(noun), tree(noun), pick(verb)}, humans can easily come up with scenes like "a boy is picking an apple from a tree" via their generative commonsense reasoning ability. However, we find this capacity has not been well learned by machines. Most prior works in machine commonsense focus on discriminative reasoning tasks with a multi-choice question answering setting. Herein, we present CommonGen: a challenging dataset for testing generative commonsense reasoning with a constrained text generation task. We collect 37k concept-sets as inputs and 90k human-written sentences as associated outputs. Additionally, we also provide high-quality rationales behind the reasoning process for the development and test sets from the human annotators. We demonstrate the difficulty of the task by examining a wide range of sequence generation methods with both automatic metrics and human evaluation. The state-of-the-art pre-trained generation model, UniLM, is still far from human performance in this task. Our data and code is publicly available at http://inklab.usc.edu/CommonGen/ .