Continual learning with hypernetworks
von Oswald, Johannes, Henning, Christian, Sacramento, João, Grewe, Benjamin F.
–arXiv.org Artificial Intelligence
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key observation: instead of relying on recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing previous weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving good performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display an unprecedented capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning properties. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.
arXiv.org Artificial Intelligence
Jun-3-2019
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.14)
- North America
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Alberta
- United States > California
- Europe
- France (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Asia > China
- Oceania > Australia
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Health & Medicine (0.68)
- Education (0.46)
- Technology: