Jogging the Memory of Unlearned Model Through Targeted Relearning Attack

Hu, Shengyuan, Fu, Yiwei, Wu, Zhiwei Steven, Smith, Virginia

Jun-19-2024–arXiv.org Artificial Intelligence

Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jun-19-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States (0.28)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.35)

Industry:
- Education (0.93)
- Health & Medicine
  - Epidemiology (0.97)
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Immunology (1.00)
    - Infections and Infectious Diseases (1.00)
    - Pulmonary/Respiratory Diseases (0.69)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.68)
  - Natural Language > Large Language Model (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found