Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
–Neural Information Processing Systems
Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of Spiderman. Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for ML researchers and policymakers to think rigorously about these challenges, identifying several mismatches between the goals of unlearning and feasible implementations. These mismatches explain why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact.
Neural Information Processing Systems
Jun-13-2026, 01:06:53 GMT
- Technology: