CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept

Wu, YuXuan, Dossou, Bonaventure F. P., Liu, Dianbo

Oct-8-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) offer extensive knowledge across various domains, but they may inadvertently memorize sensitive, unauthorized, or malicious data, such as personal information in the medical and financial sectors. Machine unlearning methods aim to remove specific information from models after training to address this. However, current approaches require additional model training or struggle to effectively erase particular data points and their associated context due to LLMs' complex, dense, and continuous nature. In this study, we propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs). By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data. To the best of our knowledge, this is the first work that successfully enables unlearning specific topics with contextual relevance in an LLM, marking a significant step towards real-world applications of machine unlearning. Large language Models (LLMs) have been widely used in various applications, generating text responses that attempt to create the equivalent of human conversations OpenAI et al. (2024). These models leverage vast scientific literature to facilitate and accelerate interdisciplinary research Taylor et al. (2022) while drawing upon large datasets of human-generated content to provide professional advice. However, in many cases, such data is a double-edged sword. Including personal information or sensitive scientific knowledge can be beneficial or, conversely, harmful. For instance, Soice et al. (2023) discusses how LLMs, when used by non-experts, can enable the creation of biological agents, posing both potential benefits and significant risks.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Oct-8-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - New York > New York County
      - New York City (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
  - Canada > Quebec
    - Montreal (0.14)
- Asia > Singapore
  - Central Region > Singapore (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)