Goto

Collaborating Authors

 Ye, Xiaotian


UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets

arXiv.org Artificial Intelligence

Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets. LLM unlearning aims to eliminate the influence of such harmful information while maintaining the model's overall performance. Existing unlearning methods, represented by gradient ascent-based approaches, primarily focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning. In this paper, through both theoretical and experimental analyses, we first demonstrate that a key reason for the suboptimal unlearning performance is that models can reconstruct the target content through reasoning with logically related knowledge. To address this issue, we propose Unlearning Improvement via Parameter Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets. Experimental results show that UIPE significantly enhances the performance of various mainstream LLM unlearning methods on the TOFU benchmark.


Uncovering Overfitting in Large Language Model Editing

arXiv.org Artificial Intelligence

Knowledge editing has been proposed as an effective method for updating and correcting the internal knowledge of Large Language Models (LLMs). In this paper, we identify and investigate the phenomenon of Editing Overfit, where edited models assign disproportionately high probabilities to the edit target, hindering the generalization of new knowledge in complex scenarios. We attribute this issue to the current editing paradigm, which places excessive emphasis on the direct correspondence between the input prompt and the edit target for each edit sample. To further explore this issue, we introduce a new benchmark, EVOKE (EValuation of Editing Overfit in Knowledge Editing), along with finegrained evaluation metrics. Through comprehensive experiments and analysis, we demonstrate that Editing Overfit is prevalent in current editing methods and that common overfitting mitigation strategies are of limited effectiveness in knowledge editing. To overcome this, inspired by LLMs' knowledge recall mechanisms, we propose a new plug-and-play strategy called Learn to Inference (LTI), which introduce a Multi-stage Inference Constraint module to guide the edited models in recalling new knowledge similarly to how unedited LLMs leverage knowledge through in-context learning. Large Language Models (LLMs) have achieved remarkable success across various Natural Language Processing (NLP) tasks (Zhao et al., 2023), yet they often contain outdated or incorrect information, raising concerns about their reliability and factual accuracy. Knowledge Editing (Yao et al., 2023) has emerged as a promising solution to precisely update or correct a model's knowledge. Approaches to knowledge editing fall into two main categories: parameter-preserving methods, such as SERAC (Mitchell et al., 2022) and T-patcher (Huang et al.), which adjust outputs by storing external knowledge, and parameter-modifying methods, which directly alter the model's internal parameters. The latter includes fine-tuning-based methods like FT-L (Zhu et al., 2020), meta-learning approaches such as KE (De Cao et al., 2021) and MEND (Mitchell et al., 2021), and locate-then-edit methods like ROME (Meng et al., 2022a) and MEMIT (Meng et al., 2022b). Although existing methods have achieved promising results, their performance experiences a catastrophic decline when transferred to complex tasks involving reasoning (Yao et al., 2023). For instance, in the representative multi-hop reasoning task, after the LLM is updated with Steve Jobs as the founder of Microsoft, it can easily respond to straightforward questions like "Who is the founder of Microsoft?" with "Steve Jobs."


Knowledge Graph Enhanced Large Language Model Editing

arXiv.org Artificial Intelligence

Large language models (LLMs) are pivotal in advancing natural language processing (NLP) tasks, yet their efficacy is hampered by inaccuracies and outdated knowledge. Model editing emerges as a promising solution to address these challenges. However, existing editing methods struggle to track and incorporate changes in knowledge associated with edits, which limits the generalization ability of postedit LLMs in processing edited knowledge. To tackle these problems, we propose a novel model editing method that leverages knowledge graphs for enhancing LLM editing, namely GLAME. Specifically, we first utilize a knowledge graph augmentation module to uncover associated knowledge that has changed due to editing, obtaining its internal representations within LLMs. This approach allows knowledge alterations within LLMs to be reflected through an external graph structure. Subsequently, we design a graph-based knowledge edit module to integrate structured knowledge into the model editing. This ensures that the updated parameters reflect not only the modifications of the edited knowledge but also the changes in other associated knowledge resulting from the editing process. Comprehensive experiments conducted on GPT-J and GPT-2 XL demonstrate that GLAME significantly improves the generalization capabilities of post-edit LLMs in employing edited knowledge.