Goto

Collaborating Authors

 Commonsense Reasoning


RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms

arXiv.org Artificial Intelligence

Pre-trained language models (PTLM) have impressive performance on commonsense inference benchmarks, but their ability to practically employ commonsense to communicate with humans is fiercely debated. Prior evaluations of PTLMs have focused on factual world knowledge or the ability to reason when the necessary knowledge is provided explicitly. However, effective communication with humans requires inferences based on implicit commonsense relationships, and robustness despite paraphrasing. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA, that evaluates the capabilities of making commonsense inferences and the robustness of these inferences to language variations. In our work, we develop a systematic procedure to probe PTLMs across three different evaluation settings. Extensive experiments on our generated probe sets show that PTLMs perform no better than random guessing (even with fine-tuning), are heavily impacted by statistical biases, and are not robust to perturbation attacks. Our framework and probe sets can help future work improve PTLMs' inference abilities and robustness to linguistic variations--bringing us closer to more fluid communication.


Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

arXiv.org Artificial Intelligence

Recent works show that pre-trained language models (PTLMs), such as BERT, possess certain commonsense and factual knowledge. They suggest that it is promising to use PTLMs as "neural knowledge bases" via predicting masked words. Surprisingly, we find that this may not work for numerical commonsense knowledge (e.g., a bird usually has two legs). In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. To study this, we introduce a novel probing task with a diagnostic dataset, NumerSense, containing 13.6k masked-word-prediction probes (10.5k for fine-tuning and 3.1k for testing). Our analysis reveals that: (1) BERT and its stronger variant RoBERTa perform poorly on the diagnostic dataset prior to any fine-tuning; (2) fine-tuning with distant supervision brings some improvement; (3) the best supervised model still performs poorly as compared to human performance (54.06% vs 96.3% in accuracy).


An Atlas of Cultural Commonsense for Machine Reasoning

arXiv.org Artificial Intelligence

Existing commonsense reasoning datasets for AI and NLP tasks fail to address an important aspect of human life: cultural differences. In this work, we introduce an approach that extends prior work on crowdsourcing commonsense knowledge by incorporating differences in knowledge that are attributable to cultural or national groups. We demonstrate the technique by collecting commonsense knowledge that surrounds three fairly universal rituals---coming-of-age, marriage, and funerals---across three different national groups: the United States, India, and the Philippines. Our pilot study expands the different types of relationships identified by existing work in the field of commonsense reasoning for commonplace events, and uses these new types to gather information that distinguishes the knowledge of the different groups. It also moves us a step closer towards building a machine that doesn't assume a rigid framework of universal (and likely Western-biased) commonsense knowledge, but rather has the ability to reason in a contextually and culturally sensitive way. Our hope is that cultural knowledge of this sort will lead to more human-like performance in NLP tasks such as question answering (QA) and text understanding and generation.


To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI

arXiv.org Artificial Intelligence

One of the ambitions of artificial intelligence is to root artificial intelligence deeply in basic science while developing brain-inspired artificial intelligence platforms that will promote new scientific discoveries. The challenges are essential to push artificial intelligence theory and applied technologies research forward. This paper presents the grand challenges of artificial intelligence research for the next 20 years which include:~(i) to explore the working mechanism of the human brain on the basis of understanding brain science, neuroscience, cognitive science, psychology and data science; (ii) how is the electrical signal transmitted by the human brain? What is the coordination mechanism between brain neural electrical signals and human activities? (iii)~to root brain-computer interface~(BCI) and brain-muscle interface~(BMI) technologies deeply in science on human behaviour; (iv)~making research on knowledge-driven visual commonsense reasoning~(VCR), develop a new inference engine for cognitive network recognition~(CNR); (v)~to develop high-precision, multi-modal intelligent perceptrons; (vi)~investigating intelligent reasoning and fast decision-making systems based on knowledge graph~(KG). We believe that the frontier theory innovation of AI, knowledge-driven modeling methodologies for commonsense reasoning, revolutionary innovation and breakthroughs of the novel algorithms and new technologies in AI, and developing responsible AI should be the main research strategies of AI scientists in the future.


Automated Storytelling via Causal, Commonsense Plot Ordering

arXiv.org Artificial Intelligence

Automated story plot generation is the task of generating a coherent sequence of plot events. Causal relations between plot events are believed to increase the perception of story and plot coherence. In this work, we introduce the concept of soft causal relations as causal relations inferred from commonsense reasoning. We demonstrate C2PO, an approach to narrative generation that operationalizes this concept through Causal, Commonsense Plot Ordering. Using human-participant protocols, we evaluate our system against baseline systems with different commonsense reasoning reasoning and inductive biases to determine the role of soft causal relations in perceived story quality. Through these studies we also probe the interplay of how changes in commonsense norms across storytelling genres affect perceptions of story quality.


Commonsense Knowledge in Wikidata

arXiv.org Artificial Intelligence

Wikidata and Wikipedia have been proven useful for reason-ing in natural language applications, like question answering or entitylinking. Yet, no existing work has studied the potential of Wikidata for commonsense reasoning. This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources. Starting from a definition of common sense, we devise three guiding principles, and apply them to generate a commonsense subgraph of Wikidata (Wikidata-CS). Within our approach, we map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph. Our experiments reveal that: 1) albeit Wikidata-CS represents a small portion of Wikidata, it is an indicator that Wikidata contains relevant commonsense knowledge, which can be mapped to 15 ConceptNet relations; 2) the overlap between Wikidata-CS and other commonsense sources is low, motivating the value of knowledge integration; 3) Wikidata-CS has been evolving over time at a slightly slower rate compared to the overall Wikidata, indicating a possible lack of focus on commonsense knowledge. Based on these findings, we propose three recommended actions to improve the coverage and quality of Wikidata-CS further.


How Smart is BERT? Evaluating the Language Model's Commonsense Knowledge

#artificialintelligence

In the new paper Does BERT Solve Commonsense Task via Commonsense Knowledge?, a team of researchers from Westlake University, Fudan University and Microsoft Research Asia dive deep into the large language model to discover how it encodes the structured commonsense knowledge it leverages on downstream commonsense tasks. The proven successes of pretrained language models such as BERT on various downstream tasks has stimulated research investigating the linguistic knowledge inside the model. Previous studies have revealed shallow syntactic, semantic and word sense knowledge in BERT, however, the question of how BERT deals with commonsense tasks has been relatively unexamined. CommonsenseQA is a multiple-choice question answering dataset built upon the CONCEPTNET knowledge graph. The researchers extracted multiple target concepts with the same semantic relation to a single source concept from CONCEPTNET, where each question has one of three target concepts as the correct answer. For example, "bird" is the source concept in the question "Where does a wild bird usually live?" and "countryside" is the correct answer from the possible target concepts "cage," "windowsill," and "countryside."


Commonsense Knowledge Graph Reasoning by Selection or Generation? Why?

arXiv.org Artificial Intelligence

Commonsense knowledge graph reasoning(CKGR) is the task of predicting a missing entity given one existing and the relation in a commonsense knowledge graph (CKG). Existing methods can be classified into two categories generation method and selection method. Each method has its own advantage. We theoretically and empirically compare the two methods, finding the selection method is more suitable than the generation method in CKGR. Given the observation, we further combine the structure of neural Text Encoder and Knowledge Graph Embedding models to solve the selection method's two problems, achieving competitive results. We provide a basic framework and baseline model for subsequent CKGR tasks by selection methods.


SemEval-2020 Task 4: Commonsense Validation and Explanation

arXiv.org Artificial Intelligence

In this paper, we present SemEval-2020 Task 4, Commonsense Validation and Explanation (ComVE), which includes three subtasks, aiming to evaluate whether a system can distinguish a natural language statement that makes sense to humans from one that does not, and provide the reasons. Specifically, in our first subtask, the participating systems are required to choose from two natural language statements of similar wording the one that makes sense and the one does not. The second subtask additionally asks a system to select the key reason from three options why a given statement does not make sense. In the third subtask, a participating system needs to generate the reason. We finally attracted 39 teams participating at least one of the three subtasks. For Subtask A and Subtask B, the performances of top-ranked systems are close to that of humans. However, for Subtask C, there is still a relatively large gap between systems and human performance.


CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning

arXiv.org Artificial Intelligence

This paper describes our system submitted to task 4 of SemEval 2020: Commonsense Validation and Explanation (ComVE) which consists of three sub-tasks. The challenge is to directly validate whether the system can recognize natural language statements that make sense from those that do not, and also require to generate reasonable explanation. Based on BERT architecture with multi-task setting, we propose an effective and interpretable "Explain, Reason and Predict" (ERP) system to solve the three sub-tasks about commonsense: (a) Validation, and (c) Explanation, (b) Reasoning, following the order of the competition. Inspired by cognitive studies of common sense, our system first generate a reason or understanding of the sentences and then choose which one statement makes sense, which is achieved by multi-task learning. The rational experiment validates our assumption and boost the performance. During the post-evaluation, our system has reached 92.9% accuracy in subtask A (rank 11), 89.7% accuracy in subtask B (rank 8), and BLEU score of 12.9 in subtask C (rank 9)