variational bayesian unlearning
Review for NeurIPS paper: Variational Bayesian Unlearning
Given that the method is only approximate and forgetting a specific data point cannot be theoretically guaranteed, can the authors comment on how practically applicable the proposed approach is? Or are the GDPR requirements so strict as to require retraining/proofed forgetting to be fulfilled making the paper a nice first step, but leaving lots of further problems until the formal requirements are met? - l67 and others refer to the Kullback Leibler divergence as a distance. Given that it is not a distance due to its lack of symmetry it should properly be called divergence or relative entropy.
Variational Bayesian Unlearning
This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i.e., including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge.