personx
Commonsense Knowledge Editing Based on Free-Text in LLMs
Huang, Xiusheng, Wang, Yequan, Zhao, Jun, Liu, Kang
Knowledge editing technology is crucial for maintaining the accuracy and timeliness of large language models (LLMs) . However, the setting of this task overlooks a significant portion of commonsense knowledge based on free-text in the real world, characterized by broad knowledge scope, long content and non instantiation. The editing objects of previous methods (e.g., MEMIT) were single token or entity, which were not suitable for commonsense knowledge in free-text form. To address the aforementioned challenges, we conducted experiments from two perspectives: knowledge localization and knowledge editing. Firstly, we introduced Knowledge Localization for Free-Text(KLFT) method, revealing the challenges associated with the distribution of commonsense knowledge in MLP and Attention layers, as well as in decentralized distribution. Next, we propose a Dynamics-aware Editing Method(DEM), which utilizes a Dynamics-aware Module to locate the parameter positions corresponding to commonsense knowledge, and uses Knowledge Editing Module to update knowledge. The DEM method fully explores the potential of the MLP and Attention layers, and successfully edits commonsense knowledge based on free-text. The experimental results indicate that the DEM can achieve excellent editing performance.
Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs
Fang, Tianqing, Chen, Zeming, Song, Yangqiu, Bosselut, Antoine
Event commonsense reasoning requires the ability to reason about the relationship between events, as well as infer implicit context underlying that relationship. However, data scarcity makes it challenging for language models to learn to generate commonsense inferences for contexts and questions involving interactions between complex events. To address this demand, we present COM2 (COMplex COMmonsense), a new dataset created by sampling multi-hop logical queries (e.g., the joint effect or cause of both event A and B, or the effect of the effect of event C) from an existing commonsense knowledge graph (CSKG), and verbalizing them using handcrafted rules and large language models into multiple-choice and text generation questions. Our experiments show that language models trained on COM2 exhibit significant improvements in complex reasoning ability, resulting in enhanced zero-shot performance in both in-domain and out-of-domain tasks for question answering and generative commonsense reasoning, without expensive human annotations. Code and data are available at https://github.com/tqfang/complex-commonsense-reasoning.
Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization
He, Mutian, Fang, Tianqing, Wang, Weiqi, Song, Yangqiu
Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and making inferences based on that, is a vital component in human intelligence for commonsense reasoning. Despite recent progress in artificial intelligence to acquire and model commonsense attributed to neural language models and commonsense knowledge graphs (CKGs), conceptualization is yet to be introduced thoroughly, making current approaches ineffective to cover knowledge about countless diverse entities and situations in the real world. To address the problem, we thoroughly study the role of conceptualization in commonsense reasoning, and formulate a framework to replicate human conceptual induction by acquiring abstract knowledge about events regarding abstract concepts, as well as higher-level triples or inferences upon them. We then apply the framework to ATOMIC, a large-scale human-annotated CKG, aided by the taxonomy Probase. We annotate a dataset on the validity of contextualized conceptualizations from ATOMIC on both event and triple levels, develop a series of heuristic rules based on linguistic features, and train a set of neural models to generate and verify abstract knowledge. Based on these components, a pipeline to acquire abstract knowledge is built. A large abstract CKG upon ATOMIC is then induced, ready to be instantiated to infer about unseen entities or situations. Finally, we empirically show the benefits of augmenting CKGs with abstract knowledge in downstream tasks like commonsense inference and zero-shot commonsense QA.
K-Act2Emo: Korean Commonsense Knowledge Graph for Indirect Emotional Expression
Kim, Kyuhee, Lee, Surin, Lee, Sangah
In many literary texts, emotions are indirectly conveyed through descriptions of actions, facial expressions, and appearances, necessitating emotion inference for narrative understanding. In this paper, we introduce K-Act2Emo, a Korean commonsense knowledge graph (CSKG) comprising 1,900 indirect emotional expressions and the emotions inferable from them. We categorize reasoning types into inferences in positive situations, inferences in negative situations, and inferences when expressions do not serve as emotional cues. Unlike existing CSKGs, K-Act2Emo specializes in emotional contexts, and experimental results validate its effectiveness for training emotion inference models. Significantly, the BART-based knowledge model fine-tuned with K-Act2Emo outperforms Figure 1: Illustration of inferential knowledge in K-various existing Korean large language Act2Emo: PosEnv for inferences in positive situations, models, achieving performance levels comparable NegEnv for negative situations, and NonEmo when expressions to GPT-4 Turbo.
ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
Do, Quyet V., Fang, Tianqing, Diao, Shizhe, Wang, Zhaowei, Song, Yangqiu
Reasoning over Commonsense Knowledge Bases (CSKB), i.e. CSKB reasoning, has been explored as a way to acquire new commonsense knowledge based on reference knowledge in the original CSKBs and external prior knowledge. Despite the advancement of Large Language Models (LLM) and prompt engineering techniques in various reasoning tasks, they still struggle to deal with CSKB reasoning. One of the problems is that it is hard for them to acquire explicit relational constraints in CSKBs from only in-context exemplars, due to a lack of symbolic reasoning capabilities (Bengio et al., 2021). To this end, we proposed **ConstraintChecker**, a plugin over prompting techniques to provide and check explicit constraints. When considering a new knowledge instance, ConstraintChecker employs a rule-based module to produce a list of constraints, then it uses a zero-shot learning module to check whether this knowledge instance satisfies all constraints. The acquired constraint-checking result is then aggregated with the output of the main prompting technique to produce the final output. Experimental results on CSKB Reasoning benchmarks demonstrate the effectiveness of our method by bringing consistent improvements over all prompting methods. Codes and data are available at \url{https://github.com/HKUST-KnowComp/ConstraintChecker}.
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning
Wang, Weiqi, Fang, Tianqing, Li, Chunyang, Shi, Haochen, Ding, Wenxuan, Xu, Baixuan, Wang, Zhaowei, Bai, Jiaxin, Liu, Xin, Cheng, Jiayang, Chan, Chunkit, Song, Yangqiu
The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE, a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing large language models to generate both types of knowledge with critic filtering. By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples. Both types of knowledge are firmly rooted in the original ATOMIC dataset, and intrinsic evaluations demonstrate their exceptional quality and diversity. Empirical results indicate that distilling CANDLE on student models provides benefits across four downstream tasks. Our code, data, and models are publicly available at https://github.com/HKUST-KnowComp/CANDLE.
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
West, Peter, Bras, Ronan Le, Sorensen, Taylor, Lin, Bill Yuchen, Jiang, Liwei, Lu, Ximing, Chandu, Khyathi, Hessel, Jack, Baheti, Ashutosh, Bhagavatula, Chandra, Choi, Yejin
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Ghazarian, Sarik, Shao, Yijia, Han, Rujun, Galstyan, Aram, Peng, Nanyun
Commonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning. We propose ACCENT, an event commonsense evaluation metric empowered by commonsense knowledge bases (CSKBs). ACCENT first extracts event-relation tuples from a dialogue, and then evaluates the response by scoring the tuples in terms of their compatibility with the CSKB. To evaluate ACCENT, we construct the first public event commonsense evaluation dataset for open-domain dialogues. Our experiments show that ACCENT is an efficient metric for event commonsense evaluation, which achieves higher correlations with human judgments than existing baselines.
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
Kim, Hyunwoo, Sclar, Melanie, Zhou, Xuhui, Bras, Ronan Le, Kim, Gunhee, Choi, Yejin, Sap, Maarten
Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.