Unlearning via Sparse Representations

Shah, Vedant, Träuble, Frederik, Malik, Ashish, Larochelle, Hugo, Mozer, Michael, Arora, Sanjeev, Bengio, Yoshua, Goyal, Anirudh

arXiv.org Artificial Intelligence 

Both methods, Unlearning via Activations and Unlearning via Examples, successfully demonstrated unlearning of the forget class while having a negligible effect on the models' performance on the retain set. Importantly, this is achieved without any form of training, retraining, or fine-tuning as is usually required by other methods. The retain set test accuracy remains more or less constant for all three datasets except for a few minor fluctuations. This is a result of the fact that due to localized and context-dependent sparse updates during the initial training of the model, discrete key-representations corresponding to different classes in the dataset are well separated from each other, an important prerequisite discussed in (Träuble et al., 2023). Hence, all the information about a class can be unlearned by forgetting only a subset of the forget class training data in the case of Unlearning via Examples, making it very data-efficient.