Fortuitous Forgetting in Connectionist Networks
Zhou, Hattie, Vani, Ankit, Larochelle, Hugo, Courville, Aaron
–arXiv.org Artificial Intelligence
Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce forget-and-relearn as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements. Forgetting is an inescapable component of human memory. It occurs naturally as neural synapses get removed or altered over time (Wang et al., 2020), and is often thought to be an undesirable characteristic of the human mind. A well-known example is the "spacing effect", which refers to the observation that long-term recall is enhanced by spacing, rather than massing, repeated study sessions. Bjork & Allen (1970) demonstrated that the key to the spacing effect is the decreased accessibility of information in-between sessions. In this work, we study a general learning paradigm that we refer to as forget-and-relearn, and show that forgetting can also benefit learning in artificial neural networks. To generalize to unseen data, we want our models to capture generalizable concepts rather than purely statistical regularities, but these desirable solutions are a small subset of the solution space and often more difficult to learn naturally (Geirhos et al., 2020). Recently, a number of training algorithms have been proposed to improve generalization by iteratively refining the learned solution. Knowledge evolution (Taha et al., 2021) improves generalization by iteratively reinitializing one part of the network while continuously training the other. Iterative magnitude pruning (Frankle & Carbin, 2019; Frankle et al., 2019) removes weights through an iterative pruning-retraining process, and outperforms unpruned models in certain settings. Hoang et al. (2018) iteratively utilize synthetic machine translation corpus through back-translations of monolingual data.
arXiv.org Artificial Intelligence
Jan-31-2022