Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

Bennani, Mehdi Abbana, Sugiyama, Masashi

arXiv.org Machine Learning 

In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks and proxies for Transfer Learning, generalisation and tasks similarity. In this framework, we prove that OGD is robust to Catastrophic Forgetting then derive the first generalisation bound for SGD and OGD for Continual Learning. Finally, we study the limits of this framework in practice for OGD and highlight the importance of the Neural Tangent Kernel variation for Continual Learning with OGD. Continual Learning is a setting in which an agent is exposed to multiples tasks sequentially (Kirkpatrick et al., 2016). The core challenge lies in the ability of the agent to learn the new tasks while retaining the knowledge acquired from previous tasks. Too much plasticity (Nguyen et al., 2018) will lead to catastrophic forgetting, which means the degradation of the ability of the agent to perform the past tasks (McCloskey & Cohen 1989, Ratcliff 1990, Goodfellow et al. 2014). On the other hand, too much stability will hinder the agent from adapting to new tasks. While there is a large literature on Continual Learning (Parisi et al., 2019), few works have addressed the problem from a theoretical perspective. Recently, Jacot et al. (2018) established the connection between overparameterized neural networks and kernel methods by introducing the Neural Tangent Kernel (NTK).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found