Towards guarantees for parameter isolation in continual learning

Lanzillotta, Giulia, Singh, Sidak Pal, Grewe, Benjamin F., Hofmann, Thomas

arXiv.org Artificial Intelligence 

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them. Statistical models based on deep neural networks are trusted with ever more complex tasks in realworld applications. In real-world environments the ability to continually and rapidly learn new behaviors is crucial. It is therefore worthwhile to understand how deep neural networks store and integrate new information. In this paper, we want to study neural networks in the continual learning setting, where the input to the learning algorithm is a data stream. In this setting, it has been observed that training neural networks on new data often severely degrades the performance on old data, a phenomenon termed catastrophic forgetting (McCloskey & Cohen, 1989), which we will often simply refer to as forgetting herefter. Generally speaking, continual learning algorithms address catastrophic forgetting by leveraging storage external to the network and imposing constraints (implicit, or explicit) that ensure the network does not stray too far off from the prior tasks when a new task is given. The storage gets updated with each new learning task and its specific contents depend on the algorithm: typical examples include vectors in the parameter space (network parameters or gradients), input samples, or neural activities. We review the main trends in the literature in the related work section (Section 2).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found