Goto

Collaborating Authors

 gradient episodic memory


Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.


Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.



Review for NeurIPS paper: Continual Deep Learning by Functional Regularisation of Memorable Past

Neural Information Processing Systems

What are the real contributions of the paper? The idea of regularizing the outputs (or functional-regularization) has already been explored, as already said in the paper. Combining the idea of regularizing the outputs with memory-based methods is also already explored. Please see GEM [1] and A-GEM [2]. What makes this approach better or important, e.g.


Reviews: Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

The authors of the manuscript consider the continuum learning setting, where the learner observes a stream of data points from training, which are ordered according to the tasks they belong to, i.e. the learner encounters any data from the next task only after it has observed all the training data for the current one. The authors propose a set of three metrics for evaluation performance of learning algorithms in this setting, which reflect their ability to transfer information to new tasks and not forget information about the earlier tasks. Could the authors, please, comment on the difference between continuum and lifelong learning (the corresponding sentence in line 254 seems incomplete)? The authors also propose a learning method, termed Gradient of Episodic Memory (GEM). The idea of the method is to keep a set of examples from every observed task and make sure that at each update stage the loss on the observed tasks does not increase.


Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.


A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation

arXiv.org Artificial Intelligence

With increasing complexity of modern communication systems, machine learning algorithms have become a focal point of research. However, performance demands have tightened in parallel to complexity. For some of the key applications targeted by future wireless, such as the medical field, strict and reliable performance guarantees are essential, but vanilla machine learning methods have been shown to struggle with these types of requirements. Therefore, the question is raised whether these methods can be extended to better deal with the demands imposed by such applications. In this paper, we look at a combinatorial resource allocation challenge with rare, significant events which must be handled properly. We propose to treat this as a multi-task learning problem, select two methods from this domain, Elastic Weight Consolidation and Gradient Episodic Memory, and integrate them into a vanilla actor-critic scheduler. We compare their performance in dealing with Black Swan Events with the state-of-the-art of augmenting the training data distribution and report that the multi-task approach proves highly effective.


An Introduction to Lifelong Supervised Learning

arXiv.org Artificial Intelligence

This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7.


Gradient Episodic Memory with a Soft Constraint for Continual Learning

arXiv.org Artificial Intelligence

Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives. Catastrophic forgetting is the fatal shortcoming of a large decrease in performance on previous tasks when the model is learning a novel task. To alleviate this problem, the model should have the capacity to learn new knowledge and preserve learned knowledge. We propose an average gradient episodic memory (A-GEM) with a soft constraint $\epsilon \in [0, 1]$, which is a balance factor between learning new knowledge and preserving learned knowledge; our method is called gradient episodic memory with a soft constraint $\epsilon$ ($\epsilon$-SOFT-GEM). $\epsilon$-SOFT-GEM outperforms A-GEM and several continual learning benchmarks in a single training epoch; additionally, it has state-of-the-art average accuracy and efficiency for computation and memory, like A-GEM, and provides a better trade-off between the stability of preserving learned knowledge and the plasticity of learning new knowledge.


Gradient Episodic Memory for Continual Learning

Neural Information Processing Systems

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.