AITopics | Commonsense Reasoning

Collaborating Authors

Commonsense Reasoning

Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.

News Overviews Instructional Materials AI-Alerts Classics

What Really is Commonsense Knowledge?

Do, Quyet V., Li, Junze, Vuong, Tung-Duong, Wang, Zhaowei, Song, Yangqiu, Ma, Xiaojuan

arXiv.org Artificial IntelligenceNov-6-2024

Commonsense datasets have been well developed in Natural Language Processing, mainly through crowdsource human annotation. However, there are debates on the genuineness of commonsense reasoning benchmarks. In specific, a significant portion of instances in some commonsense benchmarks do not concern commonsense knowledge. That problem would undermine the measurement of the true commonsense reasoning ability of evaluated models. It is also suggested that the problem originated from a blurry concept of commonsense knowledge, as distinguished from other types of knowledge. To demystify all of the above claims, in this study, we survey existing definitions of commonsense knowledge, ground into the three frameworks for defining concepts, and consolidate them into a multi-framework unified definition of commonsense knowledge (so-called consolidated definition). We then use the consolidated definition for annotations and experiments on the CommonsenseQA and CommonsenseQA 2.0 datasets to examine the above claims. Our study shows that there exists a large portion of non-commonsense-knowledge instances in the two datasets, and a large performance gap on these two subsets where Large Language Models (LLMs) perform worse on commonsense-knowledge instances.

commonsense, knowledge, referenced knowledge, (14 more...)

arXiv.org Artificial Intelligence

2411.03964

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(11 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry:

Education (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multimodal Commonsense Knowledge Distillation for Visual Question Answering

Yang, Shuo, Luo, Siwen, Han, Soyeon Caren

arXiv.org Artificial IntelligenceNov-4-2024

Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in the general Visual Question Answering (VQA). However, these models struggle with VQA questions that require external commonsense knowledge due to the challenges in generating high-quality prompts and the high computational costs of fine-tuning. In this work, we propose a novel graph-based multimodal commonsense knowledge distillation framework that constructs a unified relational graph over commonsense knowledge, visual objects and questions through a Graph Convolutional Network (GCN) following a teacher-student environment. This proposed framework is flexible with any type of teacher and student models without further fine-tuning, and has achieved competitive performances on the ScienceQA dataset.

artificial intelligence, natural language, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2411.02722

Country: Oceania > Australia > Western Australia (0.04)

Genre: Research Report (0.65)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Commonsense Knowledge Editing Based on Free-Text in LLMs

Huang, Xiusheng, Wang, Yequan, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceOct-31-2024

Knowledge editing technology is crucial for maintaining the accuracy and timeliness of large language models (LLMs) . However, the setting of this task overlooks a significant portion of commonsense knowledge based on free-text in the real world, characterized by broad knowledge scope, long content and non instantiation. The editing objects of previous methods (e.g., MEMIT) were single token or entity, which were not suitable for commonsense knowledge in free-text form. To address the aforementioned challenges, we conducted experiments from two perspectives: knowledge localization and knowledge editing. Firstly, we introduced Knowledge Localization for Free-Text(KLFT) method, revealing the challenges associated with the distribution of commonsense knowledge in MLP and Attention layers, as well as in decentralized distribution. Next, we propose a Dynamics-aware Editing Method(DEM), which utilizes a Dynamics-aware Module to locate the parameter positions corresponding to commonsense knowledge, and uses Knowledge Editing Module to update knowledge. The DEM method fully explores the potential of the MLP and Attention layers, and successfully edits commonsense knowledge based on free-text. The experimental results indicate that the DEM can achieve excellent editing performance.

commonsense knowledge, editing, knowledge, (14 more...)

arXiv.org Artificial Intelligence

2410.23844

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reasons and Solutions for the Decline in Model Performance after Editing

Huang, Xiusheng, Liu, Jiaxiang, Wang, Yequan, Liu, Kang

arXiv.org Artificial IntelligenceOct-31-2024

Knowledge editing technology has received widespread attention for low-cost updates of incorrect or outdated knowledge in large-scale language models. However, recent research has found that edited models often exhibit varying degrees of performance degradation. The reasons behind this phenomenon and potential solutions have not yet been provided. In order to investigate the reasons for the performance decline of the edited model and optimize the editing method, this work explores the underlying reasons from both data and model perspectives. Specifically, 1) from a data perspective, to clarify the impact of data on the performance of editing models, this paper first constructs a Multi-Question Dataset (MQD) to evaluate the impact of different types of editing data on model performance. The performance of the editing model is mainly affected by the diversity of editing targets and sequence length, as determined through experiments. 2) From a model perspective, this article explores the factors that affect the performance of editing models. The results indicate a strong correlation between the L1-norm of the editing model layer and the editing accuracy, and clarify that this is an important factor leading to the bottleneck of editing performance. Finally, in order to improve the performance of the editing model, this paper further proposes a Dump for Sequence (D4S) method, which successfully overcomes the previous editing bottleneck by reducing the L1-norm of the editing layer, allowing users to perform multiple effective edits and minimizing model damage. Our code is available at https://github.com/nlpkeg/D4S.

arxiv preprint arxiv, edited model, editing, (12 more...)

arXiv.org Artificial Intelligence

2410.23843

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Add feedback

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Shuttleworth, Reece, Andreas, Jacob, Torralba, Antonio, Sharma, Pratyusha

arXiv.org Artificial IntelligenceOct-28-2024

Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.21228

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

Revealing and Mitigating the Local Pattern Shortcuts of Mamba

You, Wangjie, Tang, Zecheng, Li, Juntao, Yao, Lili, Zhang, Min

arXiv.org Artificial IntelligenceOct-21-2024

Large language models (LLMs) have advanced significantly due to the attention mechanism, but their quadratic complexity and linear memory demands limit their performance on long-context tasks. Recently, researchers introduced Mamba, an advanced model built upon State Space Models(SSMs) that offers linear complexity and constant memory. Although Mamba is reported to match or surpass the performance of attention-based models, our analysis reveals a performance gap: Mamba excels in tasks that involve localized key information but faces challenges with tasks that require handling distributed key information. Our controlled experiments suggest that this inconsistency arises from Mamba's reliance on local pattern shortcuts, which enable the model to remember local key information within its limited memory but hinder its ability to retain more dispersed information. Therefore, we introduce a global selection module into the Mamba model to address this issue. Experiments on both existing and proposed synthetic tasks, as well as real-world tasks, demonstrate the effectiveness of our method. Notably, with the introduction of only 4M extra parameters, our approach enables the Mamba model(130M) to achieve a significant improvement on tasks with distributed information, increasing its performance from 0 to 80.54 points.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15678

Genre:

Research Report > Experimental Study (0.88)
Research Report > Strength High (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)

Add feedback

Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Han, Kaiqiao, Fang, Tianqing, Wang, Zhaowei, Song, Yangqiu, Steedman, Mark

arXiv.org Artificial IntelligenceOct-15-2024

While Large Language Models (LLMs) have showcased remarkable proficiency in reasoning, there is still a concern about hallucinations and unreliable reasoning issues due to semantic associations and superficial logical chains. To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Concept-Reversed Winograd Schema Challenge (CR-WSC), based on the famous Winograd Schema Challenge (WSC) dataset. By simply reversing the concepts to those that are more associated with the wrong answer, we find that the performance of LLMs drops significantly despite the rationale of reasoning remaining the same. Furthermore, we propose Abstraction-of-Thought (AoT), a novel prompt method for recovering adversarial cases to normal cases using conceptual abstraction to improve LLMs' robustness and consistency in reasoning, as demonstrated by experiments on CR-WSC.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.1204

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution

Porada, Ian, Cheung, Jackie Chi Kit

arXiv.org Artificial IntelligenceOct-12-2024

Challenge sets such as the Winograd Schema Challenge (WSC) are used to benchmark systems' ability to resolve ambiguities in natural language. If one assumes as in existing work that solving a given challenge set is at least as difficult as solving some more general task, then high performance on the challenge set should indicate high performance on the general task overall. However, we show empirically that this assumption of difficulty does not always hold. In particular, we demonstrate that despite the strong performance of prompted language models (LMs) on the WSC and its variants, these same modeling techniques perform relatively poorly at resolving certain pronominal ambiguities attested in OntoNotes and related datasets that are perceived to be easier. Motivated by these findings, we propose a method for ensembling a prompted LM with a supervised, task-specific system that is overall more accurate at resolving pronominal coreference across datasets. Finally, we emphasize that datasets involving the same linguistic phenomenon draw on distinct, but overlapping, capabilities, and evaluating on any one dataset alone does not provide a complete picture of a system's overall capability.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.09448

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(29 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Zero-shot Commonsense Reasoning over Machine Imagination

Park, Hyuntae, Kim, Yeachan, Park, Jun-Hyung, Lee, SangKeun

arXiv.org Artificial IntelligenceOct-11-2024

Recent approaches to zero-shot commonsense reasoning have enabled Pre-trained Language Models (PLMs) to learn a broad range of commonsense knowledge without being tailored to specific situations. However, they often suffer from human reporting bias inherent in textual commonsense knowledge, leading to discrepancies in understanding between PLMs and humans. In this work, we aim to bridge this gap by introducing an additional information channel to PLMs. We propose Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images. To achieve this, we enhance PLMs with imagination capabilities by incorporating an image generator into the reasoning process. To guide PLMs in effectively leveraging machine imagination, we create a synthetic pre-training dataset that simulates visual question-answering. Our extensive experiments on diverse reasoning benchmarks and analysis show that Imagine outperforms existing methods by a large margin, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.09329

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Middle East (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning

Neural Information Processing SystemsOct-10-2024, 15:53:16 GMT

In this paper, we propose a Disentangled Counterfactual Learning (DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention.

disentangled counterfactual learning, physical audiovisual commonsense reasoning, reasoning ability, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.65)

Add feedback