AITopics | Commonsense Reasoning

Collaborating Authors

Commonsense Reasoning

Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.

News Overviews Instructional Materials AI-Alerts Classics

Gold: A Global and Local-aware Denoising Framework for Commonsense Knowledge Graph Noise Detection

Deng, Zheye, Wang, Weiqi, Wang, Zhaowei, Liu, Xin, Song, Yangqiu

arXiv.org Artificial IntelligenceOct-18-2023

Commonsense Knowledge Graphs (CSKGs) are crucial for commonsense reasoning, yet constructing them through human annotations can be costly. As a result, various automatic methods have been proposed to construct CSKG with larger semantic coverage. However, these unsupervised approaches introduce spurious noise that can lower the quality of the resulting CSKG, which cannot be tackled easily by existing denoising algorithms due to the unique characteristics of nodes and structures in CSKGs. To address this issue, we propose Gold (Global and Local-aware Denoising), a denoising framework for CSKGs that incorporates entity semantic information, global rules, and local structural information from the CSKG. Experiment results demonstrate that Gold outperforms all baseline methods in noise detection tasks on synthetic noisy CSKG benchmarks. Furthermore, we show that denoising a real-world CSKG is effective and even benefits the downstream zero-shot commonsense question-answering task.

commonsense knowledge graph noise detection, global and local-aware denoising framework, gold

arXiv.org Artificial Intelligence

2310.12011

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)

Add feedback

Being Right for Whose Right Reasons?

Jakobsen, Terne Sasha Thorn, Cabello, Laura, Søgaard, Anders

arXiv.org Artificial IntelligenceOct-13-2023

Explainability methods are used to benchmark the extent to which model predictions align with human rationales i.e., are 'right for the right reasons'. Previous work has failed to acknowledge, however, that what counts as a rationale is sometimes subjective. This paper presents what we think is a first of its kind, a collection of human rationale annotations augmented with the annotators demographic information. We cover three datasets spanning sentiment analysis and common-sense reasoning, and six demographic groups (balanced across age and ethnicity). Such data enables us to ask both what demographics our predictions align with and whose reasoning patterns our models' rationales align with. We find systematic inter-group annotator disagreement and show how 16 Transformer-based models align better with rationales provided by certain demographic groups: We find that models are biased towards aligning best with older and/or white annotators. We zoom in on the effects of model size and model distillation, finding -- contrary to our expectations -- negative correlations between model size and rationale agreement as well as no evidence that either model size or model distillation improves fairness.

agreement, annotator, rationale, (16 more...)

arXiv.org Artificial Intelligence

2306.00639

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Advancing Transformer's Capabilities in Commonsense Reasoning

Zhou, Yu, Han, Yunqiu, Zhou, Hanyu, Wu, Yulun

arXiv.org Artificial IntelligenceOct-10-2023

Recent advances in general purpose pre-trained language models have shown great potential in commonsense reasoning. However, current works still perform poorly on standard commonsense reasoning benchmarks including the Com2Sense Dataset. We argue that this is due to a disconnect with current cutting-edge machine learning methods. In this work, we aim to bridge the gap by introducing current ML-based methods to improve general purpose pre-trained language models in the task of commonsense reasoning. Specifically, we experiment with and systematically evaluate methods including knowledge transfer, model ensemble, and introducing an additional pairwise contrastive objective. Our best model outperforms the strongest previous works by ~15\% absolute gains in Pairwise Accuracy and ~8.7\% absolute gains in Standard Accuracy.

computational linguistic, dataset, pairwise accuracy, (12 more...)

arXiv.org Artificial Intelligence

2310.06803

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Faithful Knowledge Graph Explanations for Commonsense Reasoning

Zhai, Weihe, Zubiaga, Arkaitz, Liu, Bingquan

arXiv.org Artificial IntelligenceOct-7-2023

While fusing language models (LMs) and knowledge graphs (KGs) has become common in commonsense question answering research, enabling faithful chain-of-thought explanations in these models remains an open problem. One major weakness of current KG-based explanation techniques is that they overlook the faithfulness of generated explanations during evaluation. To address this gap, we make two main contributions: (1) We propose and validate two quantitative metrics - graph consistency and graph fidelity - to measure the faithfulness of KG-based explanations. (2) We introduce Consistent GNN (CGNN), a novel training method that adds a consistency regularization term to improve explanation faithfulness. Our analysis shows that predictions from KG often diverge from original model predictions. The proposed CGNN approach boosts consistency and fidelity, demonstrating its potential for producing more faithful explanations. Our work emphasises the importance of explicitly evaluating suggest a path forward for developing architectures for faithful graph-based explanations.

commonsense reasoning, faithful knowledge graph explanation

arXiv.org Artificial Intelligence

2310.0491

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.40)

Add feedback

Jointly Training Large Autoregressive Multimodal Models

Aiello, Emanuele, Yu, Lili, Nie, Yixin, Aghajanyan, Armen, Oguz, Barlas

arXiv.org Artificial IntelligenceSep-28-2023

In recent years, advances in the large-scale pretraining of language and text-toimage models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixedmodal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose. Autoregressive text-to-image models, as exemplified by works such as Yu et al. (2023; 2022), have made remarkable strides in generating highly detailed images, paralleling the achievements of Diffusion Models Nichol et al. (2022); ...

arxiv preprint arxiv, instruction, language model, (14 more...)

arXiv.org Artificial Intelligence

2309.15564

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Hawaii > Honolulu County (0.04)
South America > French Guiana > Guyane > Cayenne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback

"Tidy Up the Table": Grounding Common-sense Objective for Tabletop Object Rearrangement

Xu, Yiqing, Hsu, David

arXiv.org Artificial IntelligenceSep-17-2023

Tidying up a messy table may appear simple for humans, but articulating clear criteria for tidiness is challenging due to the ambiguous nature of common sense reasoning. Large Language Models (LLMs) have proven capable of capturing common sense knowledge to reason over this vague concept of tidiness. However, they alone may struggle with table tidying due to the limited grasp on the spatio-visual aspects of tidiness. In this work, we aim to ground the common-sense concept of tidiness within the context of object arrangement. Our survey reveals that humans usually factorize tidiness into semantic and visual-spatial tidiness; our grounding approach aligns with this decomposition. We connect a language-based policy generator with an image-based tidiness score function: the policy generator utilizes the LLM's commonsense knowledge to cluster objects by their implicit types and functionalities for semantic tidiness; meanwhile, the tidiness score function assesses the visual-spatial relations of the object to achieve visual-spatial tidiness. Our tidiness score is trained using synthetic data generated cheaply from customized random walks, which inherently encode the order of tidiness, thereby bypassing the need for labor-intensive human demonstrations. The simulated experiment shows that our approach successfully generates tidy arrangements, predominately in 2D, with potential for 3D stacking, for tables with various novel objects.

arrangement, tidiness, visual-spatial tidiness, (15 more...)

arXiv.org Artificial Intelligence

2307.11319

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Bipol: Multi-axes Evaluation of Bias with Explainability in Benchmark Datasets

Adewumi, Tosin, Södergren, Isabella, Alkhaled, Lama, Sabry, Sana Sabah, Liwicki, Foteini, Liwicki, Marcus

arXiv.org Artificial IntelligenceSep-16-2023

We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Wino-gender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labelled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.

computational linguistic, dataset, top-5 gender frequent term, (12 more...)

arXiv.org Artificial Intelligence

2301.12139

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation

Zhou, Pei, Gopalakrishnan, Karthik, Hedayatnia, Behnam, Kim, Seokhwan, Pujara, Jay, Ren, Xiang, Liu, Yang, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceSep-11-2023

Implicit knowledge, such as common sense, is key to fluid human conversations. Current neural response generation (RG) models are trained to generate responses directly, omitting unstated implicit knowledge. In this paper, we present Think-Before-Speaking (TBS), a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak). We expect that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models. We analyze different choices to collect knowledge-aligned dialogues, represent implicit knowledge, and transition between knowledge and dialogues. Empirical results show TBS models outperform end-to-end and knowledge-augmented RG baselines on most automatic metrics and generate more informative, specific, and commonsense-following responses, as evaluated by human annotators. TBS also generates knowledge that makes sense and is relevant to the dialogue around 85\% of the time.

computational linguistic, knowledge, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2110.08501

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(9 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.61)

Add feedback

Experience and Prediction: A Metric of Hardness for a Novel Litmus Test

Isaak, Nicos, Michael, Loizos

arXiv.org Artificial IntelligenceSep-5-2023

In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.

experience and prediction, hardness, novel litmus test, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/logcom/exab005

2309.02534

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Generalised Winograd Schema and its Contextuality

Lo, Kin Ian, Sadrzadeh, Mehrnoosh, Mansfield, Shane

arXiv.org Artificial IntelligenceAug-31-2023

Ambiguities in natural language give rise to probability distributions over interpretations. The distributions are often over multiple ambiguous words at a time; a multiplicity which makes them a suitable topic for sheaf-theoretic models of quantum contextuality. Previous research showed that different quantitative measures of contextuality correlate well with Psycholinguistic research on lexical ambiguities. In this work, we focus on coreference ambiguities and investigate the Winograd Schema Challenge (WSC), a test proposed by Levesque in 2011 to evaluate the intelligence of machines. The WSC consists of a collection of multiple-choice questions that require disambiguating pronouns in sentences structured according to the Winograd schema, in a way that makes it difficult for machines to determine the correct referents but remains intuitive for human comprehension. In this study, we propose an approach that analogously models the Winograd schema as an experiment in quantum physics. However, we argue that the original Winograd Schema is inherently too simplistic to facilitate contextuality. We introduce a novel mechanism for generalising the schema, rendering it analogous to a Bell-CHSH measurement scenario. We report an instance of this generalised schema, complemented by the human judgements we gathered via a crowdsourcing platform. The resulting model violates the Bell-CHSH inequality by 0.192, thus exhibiting contextuality in a coreference resolution setting.

contextuality, scenario, winograd schema, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.384.11

2308.16498

Country:

North America > United States > Indiana (0.04)
North America > Dominican Republic (0.04)
North America > Canada (0.04)
(4 more...)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)

Add feedback