Jogging the Memory of Unlearned Model Through Targeted Relearning Attack
Hu, Shengyuan, Fu, Yiwei, Wu, Zhiwei Steven, Smith, Virginia
–arXiv.org Artificial Intelligence
Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.
arXiv.org Artificial Intelligence
Jun-19-2024
- Country:
- Asia (0.28)
- North America > United States (0.28)
- Genre:
- Research Report
- New Finding (0.46)
- Promising Solution (0.35)
- Research Report
- Industry:
- Education (0.93)
- Health & Medicine
- Epidemiology (0.97)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Immunology (1.00)
- Infections and Infectious Diseases (1.00)
- Pulmonary/Respiratory Diseases (0.69)
- Information Technology > Security & Privacy (1.00)
- Technology: