AITopics | Commonsense Reasoning

Collaborating Authors

Commonsense Reasoning

Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.

News Overviews Instructional Materials AI-Alerts Classics

Titans: Learning to Memorize at Test Time

Behrouz, Ali, Zhong, Peilin, Mirrokni, Vahab

arXiv.org Artificial IntelligenceDec-31-2024

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.

arxiv preprint arxiv, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2501.00663

Country:

Europe (0.92)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.66)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

CLOVer: Cross-Layer Orthonormal Vectors Adaption

Meng, Fanxu, Zhang, Muhan

arXiv.org Artificial IntelligenceDec-21-2024

To adapt a well-trained large model to downstream tasks, we propose constraining learning within its original latent space by leveraging linear combinations of its basis vectors. This approach ensures stable training without compromising the model's capabilities. Traditionally, constructing orthonormal bases from a matrix requires a transfer matrix, which significantly increases storage and computational overhead for parameters and feature maps. In this paper, we introduce Cross-Layer Orthonormal Vectors in Q, K, V, and O matrices, enabling their orthogonalization without the need for transfer matrices. Furthermore, the CLOVer operation eliminates redundant vectors, reducing the encoder attention parameters of Whisper-large-v3 by 46.42% without requiring additional training. For parameter-efficient and stable fine-tuning, we orthonormalized Q, K, V, and O and fine-tuned only the singular values, allowing efficient adaptation while constraining changes to the original latent space. When fine-tuning LLaMA-2-7B on eight commonsense reasoning datasets, our method outperforms LoRA by 5.4% and DoRA by 3.7%. CLOVer forgetting less previous knowledge when learning new knowledge.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.17426

Country: Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Detecting Emotional Incongruity of Sarcasm by Commonsense Reasoning

Qiu, Ziqi, Yu, Jianxing, Zhang, Yufeng, Lai, Hanjiang, Rao, Yanghui, Su, Qinliang, Yin, Jian

arXiv.org Artificial IntelligenceDec-20-2024

This paper focuses on sarcasm detection, which aims to identify whether given statements convey criticism, mockery, or other negative sentiment opposite to the literal meaning. To detect sarcasm, humans often require a comprehensive understanding of the semantics in the statement and even resort to external commonsense to infer the fine-grained incongruity. However, existing methods lack commonsense inferential ability when they face complex real-world scenarios, leading to unsatisfactory performance. To address this problem, we propose a novel framework for sarcasm detection, which conducts incongruity reasoning based on commonsense augmentation, called EICR. Concretely, we first employ retrieval-augmented large language models to supplement the missing but indispensable commonsense background knowledge. To capture complex contextual associations, we construct a dependency graph and obtain the optimized topology via graph refinement. We further introduce an adaptive reasoning skeleton that integrates prior rules to extract sentiment-inconsistent subgraphs explicitly. To eliminate the possible spurious relations between words and labels, we employ adversarial contrastive learning to enhance the robustness of the detector. Experiments conducted on five datasets demonstrate the effectiveness of EICR.

commonsense reasoning, detecting emotional incongruity, sarcasm

arXiv.org Artificial Intelligence

2412.12808

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.40)

Add feedback

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paischer, Fabian, Hauzenberger, Lukas, Schmied, Thomas, Alkin, Benedikt, Deisenroth, Marc Peter, Hochreiter, Sepp

arXiv.org Machine LearningDec-16-2024

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across the model weights. Recent works focus on different initialization schemes or the learning of adaptive ranks during fine-tuning. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to suboptimal performance. We propose to improve LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition (SVD) on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and redistribute ranks among all weight matrices to provably store the maximum amount of information of the downstream data in the newly introduced weights. In this way, only what information to maintain or neglect during the fine-tuning process needs to be learned. We call our new method $\textbf{E}$xplained $\textbf{V}$ariance $\textbf{A}$daptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and achieves the highest average score across a multitude of tasks per domain while reducing the number of trainable parameters through rank redistribution.

large language model, machine learning, natural language, (23 more...)

arXiv.org Machine Learning

2410.0717

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.14)
(27 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

ConceptEdit: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning

Zhang, Liyu, Wang, Weiqi, Fang, Tianqing, Song, Yangqiu

arXiv.org Artificial IntelligenceDec-15-2024

Knowledge Editing (KE) aims to adjust a Large Language Model's (LLM) internal representations and parameters to correct inaccuracies and improve output consistency without incurring the computational expense of re-training the entire model. However, editing commonsense knowledge still faces difficulties, including limited knowledge coverage in existing resources, the infeasibility of annotating labels for an overabundance of commonsense knowledge, and the strict knowledge formats of current editing methods. In this paper, we address these challenges by presenting ConceptEdit, a framework that integrates conceptualization and instantiation into the KE pipeline for LLMs to enhance their commonsense reasoning capabilities. ConceptEdit dynamically diagnoses implausible commonsense knowledge within an LLM using another verifier LLM and augments the source knowledge to be edited with conceptualization for stronger generalizability. Experimental results demonstrate that LLMs enhanced with ConceptEdit successfully generate commonsense knowledge with improved plausibility compared to other baselines and achieve stronger performance across multiple question answering benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.11418

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.05)
North America > United States > Florida > Miami-Dade County > Miami (0.05)
(17 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

TECO: Improving Multimodal Intent Recognition with Text Enhancement through Commonsense Knowledge Extraction

Nguyen, Quynh-Mai Thi, Nguyen, Lan-Nhi Thi, Nguyen, Cam-Van Thi

arXiv.org Artificial IntelligenceDec-11-2024

The objective of multimodal intent recognition (MIR) is to leverage various modalities-such as text, video, and audio-to detect user intentions, which is crucial for understanding human language and context in dialogue systems. Despite advances in this field, two main challenges persist: (1) effectively extracting and utilizing semantic information from robust textual features; (2) aligning and fusing non-verbal modalities with verbal ones effectively. This paper proposes a Text Enhancement with CommOnsense Knowledge Extractor (TECO) to address these challenges. We begin by extracting relations from both generated and retrieved knowledge to enrich the contextual information in the text modality. Subsequently, we align and integrate visual and acoustic representations with these enhanced text features to form a cohesive multimodal representation. Our experimental results show substantial improvements over existing baseline methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.08529

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Gated Delta Networks: Improving Mamba2 with Delta Rule

Yang, Songlin, Kautz, Jan, Hatamizadeh, Ali

arXiv.org Artificial IntelligenceDec-9-2024

Linear Transformers have gained attention as efficient alternatives to standard Transformers, but their performance in retrieval and long-context tasks has been limited. To address these limitations, recent work has explored two distinct mechanisms: gating for adaptive memory control and the delta update rule for precise memory modifications. We observe that these mechanisms are complementary: gating enables rapid memory erasure while the delta rule facilitates targeted updates. Building on this insight, we introduce the gated delta rule and develop a parallel training algorithm optimized for modern hardware. Our proposed architecture, Gated DeltaNet, consistently surpasses existing models like Mamba2 and DeltaNet across multiple benchmarks, including language modeling, common-sense reasoning, in-context retrieval, length extrapolation, and long-context understanding. We further enhance performance by developing hybrid architectures that combine Gated DeltaNet layers with sliding window attention or Mamba2 layers, achieving both improved training efficiency and superior task performance.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.06464

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Maryland > Baltimore (0.04)
(17 more...)

Genre: Research Report (0.42)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor

Chen, Jiali, Hei, Xusen, Xue, Yuqi, Wei, Yuancheng, Xie, Jiayuan, Cai, Yi, Li, Qing

arXiv.org Artificial IntelligenceDec-7-2024

Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts challenging distractors to test students' comprehension of the concepts or skills and assists them in identifying and correcting errors toward the answer, we are the pioneering research for LMMs to simulate this error correction process. To this end, we employ GPT-4 as a ``teacher'' to collect the explainable feedback dataset VCR-DF for error correction, which serves as a benchmark to evaluate the ability of LMMs to identify misconceptions and clarify reasons behind the error in VCR distractors toward final answers. In addition, we propose an LMM-based Pedagogical Expert Instructed Feedback Generation (PEIFG) model to incorporate the learnable expert prompts and multimodal instruction as guidance for feedback generation. Experimental results show that our PEIFG significantly outperforms existing LMMs. We believe that our benchmark provides a new direction for evaluating the capabilities of LMMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3681590

2412.07801

Country:

Oceania > Australia > Victoria > Melbourne (0.15)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.61)
(2 more...)

Add feedback

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Bhagia, Akshita, Liu, Jiacheng, Wettig, Alexander, Heineman, David, Tafjord, Oyvind, Jha, Ananya Harsh, Soldaini, Luca, Smith, Noah A., Groeneveld, Dirk, Koh, Pang Wei, Dodge, Jesse, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceDec-5-2024

We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks written in ranked classification format, we can predict the accuracy of both target models within 2 points of absolute error. We have higher prediction error on four other tasks (average absolute error 6.9) and find that these are often tasks with higher variance in task metrics. We also find that using less compute to train fewer ladder models tends to deteriorate predictions. Finally, we empirically show that our design choices and the two-step approach lead to superior performance in establishing scaling laws.

accuracy, ladder model, task loss, (15 more...)

arXiv.org Artificial Intelligence

2412.04403

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)

Add feedback

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Chou, Yuhong, Yao, Man, Wang, Kexin, Pan, Yuqi, Zhu, Ruijie, Zhong, Yiran, Qiao, Yu, Wu, Jibin, Xu, Bo, Li, Guoqi

arXiv.org Artificial IntelligenceNov-16-2024

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: i) Dynamic memory ability; ii) Static approximation ability; iii) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

approximation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.10741

Country:

North America > United States (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
(2 more...)

Add feedback