Automating Continual Learning
Irie, Kazuki, Csordás, Róbert, Schmidhuber, Jürgen
–arXiv.org Artificial Intelligence
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF)--previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to meta-learn their own in-context continual (meta-)learning algorithms. Our experiments demonstrate that ACL effectively solves "in-context catastrophic forgetting"; our ACL-learned algorithms outperform hand-crafted ones, e.g., on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple few-shot and standard image classification datasets. Enemies of memories are other memories (Eagleman, 2020). Continually-learning artificial neural networks (NNs) are memory systems in which their weights store memories of task-solving skills or programs, and their learning algorithm is responsible for memory read/write operations. Conventional learning algorithms--used to train NNs in the standard scenarios where all training data is available at once--are known to be inadequate for continual learning (CL) of multiple tasks where data for each task is available sequentially and exclusively, one at a time. They suffer from "catastrophic forgetting" (CF; McCloskey & Cohen (1989); Ratcliff (1990); French (1999); McClelland et al. (1995)); the NNs forget, or rather, the learning algorithm erases, previously acquired skills, in exchange of learning to solve a new task. Naturally, a certain degree of forgetting is unavoidable when the memory capacity is limited, and the amount of things to remember exceeds such an upper bound. In general, however, capacity is not the fundamental cause of CF; typically, the same NNs, suffering from CF when trained on two tasks sequentially, can perform well on both tasks when they are jointly trained on the two tasks at once instead (see, e.g., Irie et al. (2022a)). The real root of CF lies in the learning algorithm as a memory mechanism. A "good" CL algorithm should preserve previously acquired knowledge while also leveraging previous learning experiences to improve future learning, by maximally exploiting the limited memory space of model parameters. All of this is the decision-making problem of learning algorithms.
arXiv.org Artificial Intelligence
Nov-30-2023
- Country:
- Asia (0.93)
- Europe (1.00)
- North America
- Canada (0.94)
- United States
- California (0.28)
- Texas (0.28)
- Genre:
- Research Report (1.00)
- Industry:
- Education (1.00)
- Technology: