The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Allouah, Youssef, Kazdan, Joshua, Guerraoui, Rachid, Koyejo, Sanmi
–arXiv.org Artificial Intelligence
Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.
arXiv.org Artificial Intelligence
Dec-12-2024
- Country:
- Europe (0.28)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law (0.93)