A General Framework to Enhance Fine-tuning-based LLM Unlearning

Ren, Jie, Dai, Zhenwei, Tang, Xianfeng, Liu, Hui, Zeng, Jingying, Li, Zhen, Goutam, Rahul, Wang, Suhang, Xing, Yue, He, Qi, Liu, Hui

Feb-24-2025–arXiv.org Artificial Intelligence

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations, which is essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

arxiv preprint arxiv, grün, target data, (15 more...)

arXiv.org Artificial Intelligence

Feb-24-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
  - Pennsylvania (0.04)
  - Michigan (0.04)
- Africa > Guinea
  - Kankan Region > Kankan Prefecture > Kankan (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.93)
- Law (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found