Theory on Forgetting and Generalization of Continual Learning
Lin, Sen, Ju, Peizhong, Liang, Yingbin, Shroff, Ness
–arXiv.org Artificial Intelligence
Continual learning (CL) [41] is a learning paradigm where an agent needs to continuously learn a sequence of tasks. To resemble the extraordinary lifelong learning capability of human beings, the agent is expected to learn new tasks more easily based on accumulated knowledge from old tasks, and further improve the learning performance of old tasks by leveraging the knowledge of new tasks. The former is referred to as forward knowledge transfer and the latter as backward knowledge transfer. One major challenge herein is the so-called catastrophic forgetting [36], i.e., the agent easily forgets the knowledge of old tasks when learning new tasks. Although there have been significant efforts in experimental studies (e.g., [27, 14, 50, 16, 17]) to address the forgetting issue, the theoretical understanding of CL is still in the early stage, where only a few attempts have emerged recently, e.g., [49, 12, 16, 17] (see a more detailed discussion about the previous theoretical studies of CL in Section 2). However, none of these existing theoretical results provide an explicit characterization of forgetting and generalization error, that only depends on fundamental system parameters/setups (e.g., number of tasks/samples/parameters, noise level, task similarity/order). Thus, our work here provides the first-known explicit theoretical result, which enables us to comprehensively understand which factors are relevant and how they (precisely) affect forgetting and generalization error of CL. Our main contributions can be summarized as follows. First, we provide theoretical results on the expected value of forgetting and overall generalization error in CL, under a linear regression setup with i.i.d.
arXiv.org Artificial Intelligence
Feb-11-2023