Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

Kadhe, Swanand Ravindra, Ahmed, Farhan, Wei, Dennis, Baracaldo, Nathalie, Padhi, Inkit

Jun-17-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) have shown to pose social and ethical risks such as generating toxic language or facilitating malicious use of hazardous knowledge. Machine unlearning is a promising approach to improve LLM safety by directly removing harmful behaviors and knowledge. In this paper, we propose "SPlit, UNlearn, MerGE" (SPUNGE), a framework that can be used with any unlearning method to amplify its effectiveness. SPUNGE leverages data attributes during unlearning by splitting unlearning data into subsets based on specific attribute values, unlearning each subset separately, and merging the unlearned models. We empirically demonstrate that SPUNGE significantly improves the performance of two recent unlearning methods on state-of-the-art LLMs while maintaining their general capabilities on standard academic benchmarks.

knowledge, punge, toxicity, (16 more...)

arXiv.org Artificial Intelligence

Jun-17-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - Dominican Republic (0.04)
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Information Technology (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found