Goto

Collaborating Authors

 Eeckt, Steven Vander


Continual Learning With Quasi-Newton Methods

arXiv.org Artificial Intelligence

Received 17 February 2025, accepted 5 March 2025, date of publication 13 March 2025, date of current version 21 March 2025. Continual Learning with Quasi-Newton Methods STEVEN VANDER EECKT and HUGO VAN HAMME (Senior, IEEE) Department Electrical Engineering ESAT-PSI, KU Leuven, B-3001 Leuven, Belgium Corresponding author: Steven Vander Eeeckt (e-mail: steven.vandereeckt@esat.kuleuven.be).ABSTRACT Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50% and improves its performance by 8% on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.INDEX TERMS artificial neural networks, catastrophic forgetting, continual learning, quasi-Newton methods I. INTRODUCTION Since the 2010s, Artificial Neural Networks (ANNs) have been able to match or even surpass human performance on a wide variety of tasks. However, when presented with a set of tasks to be learned sequentially--a setting referred to as Continual Learning (CL)--ANNs suffer from catastrophic forgetting [1]. Unlike humans, ANNs struggle to retain previously learned knowledge when extending their knowledge. Naively adapting an ANN to a new task generally leads to a deterioration in the network's performance on previous tasks. Many CL methods have been proposed to alleviate catastrophic forgetting. One of the most well-known is Elastic Weight Consolidation (EWC) [2], which approaches CL from a Bayesian perspective. After training on a task, EWC uses Laplace approximation [3] to estimate a posterior distribution over the model parameters for that task. When training on the next task, this posterior is used via a regularization loss to prevent the model from catastrophically forgetting the previous task. To estimate the Hessian, which is needed in the Laplace approximation to measure the (un)certainty of the model parameters, EWC uses the Fisher Information Matrix (FIM). Furthermore, to simplify the computation, EWC assumes that the FIM is approximately diagonal.


Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition

arXiv.org Artificial Intelligence

Adapting a trained Automatic Speech Recognition (ASR) model to new tasks results in catastrophic forgetting of old tasks, limiting the model's ability to learn continually and to be extended to new speakers, dialects, languages, etc. Focusing on End-to-End ASR, in this paper, we propose a simple yet effective method to overcome catastrophic forgetting: weight averaging. By simply taking the average of the previous and the adapted model, our method achieves high performance on both the old and new tasks. It can be further improved by introducing a knowledge distillation loss during the adaptation. We illustrate the effectiveness of our method on both monolingual and multilingual ASR. In both cases, our method strongly outperforms all baselines, even in its simplest form.


Continual Learning for Monolingual End-to-End Automatic Speech Recognition

arXiv.org Machine Learning

Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.