AITopics | gradient episodic memory

Gradient Episodic Memory for Continual Learning

Neural Information Processing SystemsMar-17-2026, 18:55:25 GMT

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.83)

Add feedback

Gradient Episodic Memory for Continual Learning

Neural Information Processing SystemsNov-21-2025, 16:17:44 GMT

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

continual learning, gradient episodic memory, name change, (2 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Consumer Health (0.68)

Technology: Information Technology > Artificial Intelligence (0.83)

Add feedback

Gradient Episodic Memory for Continual Learning

David Lopez-Paz, Marc'Aurelio Ranzato

Neural Information Processing SystemsNov-21-2025, 13:57:16 GMT

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge.

artificial intelligence, learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry:

Education (0.94)
Health & Medicine > Consumer Health (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Continual Deep Learning by Functional Regularisation of Memorable Past

Neural Information Processing SystemsJan-23-2025, 03:07:34 GMT

What are the real contributions of the paper? The idea of regularizing the outputs (or functional-regularization) has already been explored, as already said in the paper. Combining the idea of regularizing the outputs with memory-based methods is also already explored. Please see GEM [1] and A-GEM [2]. What makes this approach better or important, e.g.

continual deep learning, functional regularisation, learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Reviews: Gradient Episodic Memory for Continual Learning

Neural Information Processing SystemsOct-8-2024, 13:29:24 GMT

The authors of the manuscript consider the continuum learning setting, where the learner observes a stream of data points from training, which are ordered according to the tasks they belong to, i.e. the learner encounters any data from the next task only after it has observed all the training data for the current one. The authors propose a set of three metrics for evaluation performance of learning algorithms in this setting, which reflect their ability to transfer information to new tasks and not forget information about the earlier tasks. Could the authors, please, comment on the difference between continuum and lifelong learning (the corresponding sentence in line 254 seems incomplete)? The authors also propose a learning method, termed Gradient of Episodic Memory (GEM). The idea of the method is to keep a set of examples from every observed task and make sure that at each update stage the loss on the observed tasks does not increase.

continual learning, gem, gradient episodic memory, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Consumer Health (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.62)

Add feedback

Gradient Episodic Memory for Continual Learning

David Lopez-Paz, Marc'Aurelio Ranzato

Neural Information Processing SystemsOct-4-2024, 10:57:39 GMT

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

backward transfer, continuum, learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry:

Education (0.69)
Health & Medicine > Consumer Health (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.72)

Add feedback

A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation

Gracla, Steffen, Bockelmann, Carsten, Dekorsy, Armin

arXiv.org Artificial IntelligenceApr-25-2023

With increasing complexity of modern communication systems, machine learning algorithms have become a focal point of research. However, performance demands have tightened in parallel to complexity. For some of the key applications targeted by future wireless, such as the medical field, strict and reliable performance guarantees are essential, but vanilla machine learning methods have been shown to struggle with these types of requirements. Therefore, the question is raised whether these methods can be extended to better deal with the demands imposed by such applications. In this paper, we look at a combinatorial resource allocation challenge with rare, significant events which must be handled properly. We propose to treat this as a multi-task learning problem, select two methods from this domain, Elastic Weight Consolidation and Gradient Episodic Memory, and integrate them into a vanilla actor-critic scheduler. We compare their performance in dealing with Black Swan Events with the state-of-the-art of augmenting the training data distribution and report that the multi-task approach proves highly effective.

machine learning, reinforcement learning, scheduler, (14 more...)

arXiv.org Artificial Intelligence

2304.1266

Country:

Europe > Germany > Bremen > Bremen (0.28)
North America > United States (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.88)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

An Introduction to Lifelong Supervised Learning

Sodhani, Shagun, Faramarzi, Mojtaba, Mehta, Sanket Vaibhav, Malviya, Pranshu, Abdelsalam, Mohamed, Janarthanan, Janarthanan, Chandar, Sarath

arXiv.org Artificial IntelligenceJul-12-2022

This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7.

fisher information matrix, lifelong learning approach, online continual learning, (16 more...)

arXiv.org Artificial Intelligence

2207.04354

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
North America > United States > California > Los Angeles County > Los Angeles (0.13)
North America > Canada > Quebec > Montreal (0.04)
(13 more...)

Genre: Instructional Material (1.00)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
(11 more...)

Add feedback

Gradient Episodic Memory with a Soft Constraint for Continual Learning

Hu, Guannan, Zhang, Wu, Ding, Hu, Zhu, Wenhao

arXiv.org Artificial IntelligenceNov-16-2020

Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives. Catastrophic forgetting is the fatal shortcoming of a large decrease in performance on previous tasks when the model is learning a novel task. To alleviate this problem, the model should have the capacity to learn new knowledge and preserve learned knowledge. We propose an average gradient episodic memory (A-GEM) with a soft constraint $\epsilon \in [0, 1]$, which is a balance factor between learning new knowledge and preserving learned knowledge; our method is called gradient episodic memory with a soft constraint $\epsilon$ ($\epsilon$-SOFT-GEM). $\epsilon$-SOFT-GEM outperforms A-GEM and several continual learning benchmarks in a single training epoch; additionally, it has state-of-the-art average accuracy and efficiency for computation and memory, like A-GEM, and provides a better trade-off between the stability of preserving learned knowledge and the plasticity of learning new knowledge.

a-gem, episodic memory, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2011.07801

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.84)

Add feedback

Gradient Episodic Memory for Continual Learning

Lopez-Paz, David, Ranzato, Marc', Aurelio

Neural Information Processing SystemsFeb-14-2020, 18:57:30 GMT

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.

continual learning, gradient episodic memory, knowledge

Neural Information Processing Systems

Industry: Health & Medicine > Consumer Health (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.66)

Add feedback