Unlearning via Sparse Representations

Shah, Vedant, Träuble, Frederik, Malik, Ashish, Larochelle, Hugo, Mozer, Michael, Arora, Sanjeev, Bengio, Yoshua, Goyal, Anirudh

Nov-26-2023–arXiv.org Artificial Intelligence

Both methods, Unlearning via Activations and Unlearning via Examples, successfully demonstrated unlearning of the forget class while having a negligible effect on the models' performance on the retain set. Importantly, this is achieved without any form of training, retraining, or fine-tuning as is usually required by other methods. The retain set test accuracy remains more or less constant for all three datasets except for a few minor fluctuations. This is a result of the fact that due to localized and context-dependent sparse updates during the initial training of the model, discrete key-representations corresponding to different classes in the dataset are well separated from each other, an important prerequisite discussed in (Träuble et al., 2023). Hence, all the information about a class can be unlearned by forgetting only a subset of the forget class training data in the case of Unlearning via Examples, making it very data-efficient.

accuracy, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-26-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (0.68)
      - Statistical Learning (0.68)
    - Natural Language (0.94)
    - Vision (0.93)
  - Security & Privacy (0.93)