Continual Learning With Quasi-Newton Methods

Mar-25-2025–arXiv.org Artificial Intelligence

Received 17 February 2025, accepted 5 March 2025, date of publication 13 March 2025, date of current version 21 March 2025. Continual Learning with Quasi-Newton Methods STEVEN VANDER EECKT and HUGO VAN HAMME (Senior, IEEE) Department Electrical Engineering ESAT-PSI, KU Leuven, B-3001 Leuven, Belgium Corresponding author: Steven Vander Eeeckt (e-mail: steven.vandereeckt@esat.kuleuven.be).ABSTRACT Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50% and improves its performance by 8% on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.INDEX TERMS artificial neural networks, catastrophic forgetting, continual learning, quasi-Newton methods I. INTRODUCTION Since the 2010s, Artificial Neural Networks (ANNs) have been able to match or even surpass human performance on a wide variety of tasks. However, when presented with a set of tasks to be learned sequentially--a setting referred to as Continual Learning (CL)--ANNs suffer from catastrophic forgetting [1]. Unlike humans, ANNs struggle to retain previously learned knowledge when extending their knowledge. Naively adapting an ANN to a new task generally leads to a deterioration in the network's performance on previous tasks. Many CL methods have been proposed to alleviate catastrophic forgetting. One of the most well-known is Elastic Weight Consolidation (EWC) [2], which approaches CL from a Bayesian perspective. After training on a task, EWC uses Laplace approximation [3] to estimate a posterior distribution over the model parameters for that task. When training on the next task, this posterior is used via a regularization loss to prevent the model from catastrophically forgetting the previous task. To estimate the Hessian, which is needed in the Laplace approximation to measure the (un)certainty of the model parameters, EWC uses the Fisher Information Matrix (FIM). Furthermore, to simplify the computation, EWC assumes that the FIM is approximately diagonal.

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

Mar-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Belgium
  - Flanders > Flemish Brabant > Leuven (0.44)
- North America > United States (0.68)

Genre:
- Overview (1.00)
- Research Report > New Finding (1.00)

Industry:
- Education > Educational Setting (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)
    - Neural Networks > Deep Learning (0.46)
  - Representation & Reasoning (1.00)