continual meta-learning
On the Stability-Plasticity Dilemma in Continual Meta-Learning: Theory and Algorithm
We focus on Continual Meta-Learning (CML), which targets accumulating and exploiting meta-knowledge on a sequence of non-i.i.d. The primary challenge is to strike a balance between stability and plasticity, where a model should be stable to avoid catastrophic forgetting in previous tasks and plastic to learn generalizable concepts from new tasks. To address this, we formulate the CML objective as controlling the average excess risk upper bound of the task sequence, which reflects the trade-off between forgetting and generalization. Based on the objective, we introduce a unified theoretical framework for CML in both static and shifting environments, providing guarantees for various task-specific learning algorithms. Moreover, we first present a rigorous analysis of a bi-level trade-off in shifting environments.
Learning from the Past: Continual Meta-Learning via Bayesian Graph Modeling
Luo, Yadan, Huang, Zi, Zhang, Zheng, Wang, Ziwei, Baktashmotlagh, Mahsa, Yang, Yang
Meta-learning for few-shot learning allows a machine to leverage previously acquired knowledge as a prior, thus improving the performance on novel tasks with only small amounts of data. However, most mainstream models suffer from catastrophic forgetting and insufficient robustness issues, thereby failing to fully retain or exploit long-term knowledge while being prone to cause severe error accumulation. In this paper, we propose a novel Continual Meta-Learning approach with Bayesian Graph Neural Networks (CML-BGNN) that mathematically formulates meta-learning as continual learning of a sequence of tasks. With each task forming as a graph, the intra- and inter-task correlations can be well preserved via message-passing and history transition. To remedy topological uncertainty from graph initialization, we utilize Bayes by Backprop strategy that approximates the posterior distribution of task-specific parameters with amortized inference networks, which are seamlessly integrated into the end-to-end edge learning. Extensive experiments conducted on the miniImageNet and tieredImageNet datasets demonstrate the effectiveness and efficiency of the proposed method, improving the performance by 42.8% compared with state-of-the-art on the miniImageNet 5-way 1-shot classification task.