AITopics | Commonsense Reasoning

Collaborating Authors

Commonsense Reasoning

Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.

News Overviews Instructional Materials AI-Alerts Classics

Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset

Fang, Tianqing, Wang, Weiqi, Choi, Sehyun, Hao, Shibo, Zhang, Hongming, Song, Yangqiu, He, Bin

arXiv.org Artificial IntelligenceSep-15-2021

Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP. While CSKB completion only fills the missing links within the domain of the CSKB, CSKB population is alternatively proposed with the goal of reasoning unseen assertions from external resources. In this task, CSKBs are grounded to a large-scale eventuality (activity, state, and event) graph to discriminate whether novel triples from the eventuality graph are plausible or not. However, existing evaluations on the population task are either not accurate (automatic evaluation with randomly sampled negative examples) or of small scale (human annotation). In this paper, we benchmark the CSKB population task with a new large-scale dataset by first aligning four popular CSKBs, and then presenting a high-quality human-annotated evaluation set to probe neural models' commonsense reasoning ability. We also propose a novel inductive commonsense reasoning model that reasons over graphs. Experimental results show that generalizing commonsense reasoning on unseen assertions is inherently a hard task. Models achieving high accuracy during training perform poorly on the evaluation set, with a large gap between human performance. We will make the data publicly available for future contributions. Codes and data are available at https://github.com/HKUST-KnowComp/CSKB-Population.

aser, personx, relation, (14 more...)

arXiv.org Artificial Intelligence

2109.07679

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)

Add feedback

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Yin, Da, Li, Liunian Harold, Hu, Ziniu, Peng, Nanyun, Chang, Kai-Wei

arXiv.org Artificial IntelligenceSep-14-2021

Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.

artificial intelligence, natural language, qa pair, (14 more...)

arXiv.org Artificial Intelligence

2109.0686

Country:

Asia > East Asia (0.25)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(13 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.66)
Government > Regional Government > North America Government > United States Government (0.46)
Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

STaCK: Sentence Ordering with Temporal Commonsense Knowledge

Ghosal, Deepanway, Majumder, Navonil, Mihalcea, Rada, Poria, Soujanya

arXiv.org Artificial IntelligenceSep-6-2021

Sentence order prediction is the task of finding the correct order of sentences in a randomly ordered document. Correctly ordering the sentences requires an understanding of coherence with respect to the chronological sequence of events described in the text. Document-level contextual understanding and commonsense knowledge centered around these events are often essential in uncovering this coherence and predicting the exact chronological order. In this paper, we introduce STaCK -- a framework based on graph neural networks and temporal commonsense knowledge to model global information and predict the relative order of sentences. Our graph network accumulates temporal evidence using knowledge of `past' and `future' and formulates sentence ordering as a constrained edge classification problem. We report results on five different datasets, and empirically show that the proposed method is naturally suitable for order prediction. The implementation of this work is publicly available at: https://github.com/declare-lab/sentence-ordering.

information, node, relative order, (16 more...)

arXiv.org Artificial Intelligence

2109.02247

Country:

Asia > Singapore (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.82)

Add feedback

CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

Onoe, Yasumasa, Zhang, Michael J. Q., Choi, Eunsol, Durrett, Greg

arXiv.org Artificial IntelligenceSep-3-2021

Most benchmark datasets targeting commonsense reasoning focus on everyday scenarios: physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al., 2019], social knowledge like bumping into someone is awkward [Sap et al., 2019], and other generic situations. However, there is a rich space of commonsense inferences anchored to knowledge about specific entities: for example, deciding the truthfulness of a claim "Harry Potter can teach classes on how to fly on a broomstick." Can models learn to combine entity knowledge with commonsense reasoning in this fashion? We introduce CREAK, a testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities (Harry Potter is a wizard and is skilled at riding a broomstick) with commonsense inferences (if you're good at a skill you can teach others how to do it). Our dataset consists of 13k human-authored English claims about entities that are either true or false, in addition to a small contrast set. Crowdworkers can easily come up with these statements and human performance on the dataset is high (high 90s); we argue that models should be able to blend entity knowledge and commonsense reasoning to do well here. In our experiments, we focus on the closed-book setting and observe that a baseline model finetuned on existing fact verification benchmark struggles on CREAK. Training a model on CREAK improves accuracy by a substantial margin, but still falls short of human performance. Our benchmark provides a unique probe into natural language understanding models, testing both its ability to retrieve facts (e.g., who teaches at the University of Chicago?) and unstated commonsense knowledge (e.g., butlers do not yell at guests).

computational linguistic, dataset, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2109.01653

Country:

North America > United States > Illinois > Cook County > Chicago (0.24)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.05)
(6 more...)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Technical Perspective: The Importance of WINOGRANDE

Communications of the ACMAug-24-2021, 22:51:00 GMT

Excelling at a test often does not translate into excelling at the skills the test purports to measure. This is true not only of humans but also of AI systems, and the more so the greater the claims of the test's significance. This became evident less than a decade after the introduction of the Winograd Schema Challenge (WSC),3 a test designed to measure an AI system's commonsense reasoning (CSR) ability by answering simple questions. An example would be, given the information: The sculpture rolled off the shelf because it wasn't anchored, answering: What wasn't anchored? There are multiple AI systems2 that achieve human performance on the WSC but are not capable of performing CSR.

benchmark, csr ability, winogrande, (15 more...)

Communications of the ACM

AI-Alerts: 2021 > 2021-08 > AAAI AI-Alert for Aug 31, 2021 (1.00)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

WinoGrande

Communications of the ACMAug-24-2021, 04:00:00 GMT

Commonsense reasoning remains a major challenge in AI, and yet, recent progresses on benchmarks may seem to suggest otherwise. In particular, the recent neural language models have reported above 90% accuracy on the Winograd Schema Challenge (WSC),22 a commonsense benchmark originally designed to be unsolvable for statistical models that rely simply on word associations. This raises an important question--whether these models have truly acquired robust commonsense capabilities or they rely on spurious biases in the dataset that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) large-scale crowdsourcing, followed by (2) systematic bias reduction using a novel AFLITE algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. Our experiments demonstrate that state-of-the-art models achieve considerably lower accuracy (59.4%-79.1%) Furthermore, we report new state-of-the-art results on five related benchmarks with emphasis on their dual implications. On the one hand, they demonstrate the effectiveness of WINOGRANDE when used as a resource for transfer learning. On the other hand, the high performance on all these benchmarks suggests the extent to which spurious biases are prevalent in all such datasets, which motivates further research on algorithmic bias reduction. Commonsense reasoning has been a long-standing open research question in AI.5 The Winograd Schema Challenge (WSC),22 proposed as an alternative to the Turing Test,39 has been regarded as a prototypical benchmark to test commonsense capabilities in AI.

benchmark, dataset, winogrande, (13 more...)

Communications of the ACM

Country:

North America > United States > Washington > King County > Seattle (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)

Add feedback

Interpretable Visual Understanding with Cognitive Attention Network

Tang, Xuejiao, Zhang, Wenbin, Yu, Yi, Turner, Kea, Derr, Tyler, Wang, Mengyu, Ntoutsi, Eirini

arXiv.org Artificial IntelligenceAug-14-2021

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at https://github.com/tanjatang/CAN

commonsense, information, representation, (12 more...)

arXiv.org Artificial Intelligence

2108.02924

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.76)
(2 more...)

Add feedback

Leveraging Commonsense Knowledge on Classifying False News and Determining Checkworthiness of Claims

Schlicht, Ipek Baris, Sezerer, Erhan, Tekir, Selma, Han, Oul, Boukhers, Zeyd

arXiv.org Artificial IntelligenceAug-8-2021

Widespread and rapid dissemination of false news has made fact-checking an indispensable requirement. Given its time-consuming and labor-intensive nature, the task calls for an automated support to meet the demand. In this paper, we propose to leverage commonsense knowledge for the tasks of false news classification and check-worthy claim detection. Arguing that commonsense knowledge is a factor in human believability, we fine-tune the BERT language model with a commonsense question answering task and the aforementioned tasks in a multi-task learning environment. For predicting fine-grained false news types, we compare the proposed fine-tuned model's performance with the false news classification models on a public dataset as well as a newly collected dataset. We compare the model's performance with the single-task BERT model and a state-of-the-art check-worthy claim detection tool to evaluate the check-worthy claim detection. Our experimental analysis demonstrates that commonsense knowledge can improve performance in both tasks.

dataset, detection, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2108.03731

Country:

North America > United States (0.46)
Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

For AI to grow up, it needs to learn on its own

#artificialintelligenceAug-7-2021, 04:25:06 GMT

In contrast, despite remarkable recent advances, artificial intelligence systems still rely disproportionately and often wholly on learning with supervision. And even the most knowledgeable AI agents can lack the ability to apply common sense reasoning. For example, a question such as "how long would it take to swim to the moon?" may elicit an "I don't know" instead of "you cannot swim to the moon." In seeking to advance AI to the next level of performance, researchers today are starting to explore foundational elements of generalizability and autonomous learning. For example, recently they've been exploring increasingly larger neural network models for language processing and computer vision tasks.

supervision

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Liu, Pengfei, Yuan, Weizhe, Fu, Jinlan, Jiang, Zhengbao, Hayashi, Hiroaki, Neubig, Graham

arXiv.org Artificial IntelligenceJul-28-2021

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website http://pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.

computational linguistic, latexit latexit sha1, latexit sha1, (14 more...)

arXiv.org Artificial Intelligence

2107.13586

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
(29 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback