AITopics | meta-rl

RL problems through the idea of "learning to learn". Current meta-RL methods can be classified in to two categories. These methods mainly differ in their ways of inference [3, 4, 20]. The other line follows the technique of relabeling that enables sample reuse across tasks, i.e., learning a task Packer et al. apply hindsight relabeling for meta-RL, and propose hindsight task relabeling (HTR) to relabel the trajectories Taking a step further than hindsight relabelling, Wan et al. introduce additionally foresight Huang et al. derive a general form of policy gradient from DR value estimator [29], whereas a DR off-policy actor-critic Kallus et al. propose the doubly robust method to find a robust policy that can Depending on the knowledge to be transferred, these methods in RL can be roughly divided into classes including sampled transitions [32, 33], learned policies or value networks [34, 35, 36, 37], features [38, 39, 40], and skills [41, 42]. Doubly Robust Property for Direct Use of Doubly Robust Estimator We show the doubly robust property of the DR estimator for value function in Eq. (5) in the main text, as follows.

dr ij, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Industry: Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Neural Information Processing SystemsDec-26-2025, 17:21:35 GMT

Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated. We provide code at https://github.com/jacooba/hyper.

electronic proceedings, name change, recurrent hypernetwork, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Neural Information Processing SystemsDec-25-2025, 02:51:04 GMT

The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.

generalization, imaginary task, latent dynamic mixture, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Neural Information Processing SystemsDec-23-2025, 19:21:44 GMT

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL via Demonstrations (EMRLD) that exploit this information---even if sub-optimal---to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.

demonstration, enhanced meta reinforcement learning, sparse reward environment, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Doubly Robust Augmented Transfer for Meta-Reinforcement Learning

Anonymous Authors

Neural Information Processing SystemsOct-9-2025, 11:39:14 GMT

RL problems through the idea of "learning to learn". Current meta-RL methods can be classified in to two categories. These methods mainly differ in their ways of inference [3, 4, 20]. The other line follows the technique of relabeling that enables sample reuse across tasks, i.e., learning a task Packer et al. apply hindsight relabeling for meta-RL, and propose hindsight task relabeling (HTR) to relabel the trajectories Taking a step further than hindsight relabelling, Wan et al. introduce additionally foresight Huang et al. derive a general form of policy gradient from DR value estimator [29], whereas a DR off-policy actor-critic Kallus et al. propose the doubly robust method to find a robust policy that can Depending on the knowledge to be transferred, these methods in RL can be roughly divided into classes including sampled transitions [32, 33], learned policies or value networks [34, 35, 36, 37], features [38, 39, 40], and skills [41, 42]. Doubly Robust Property for Direct Use of Doubly Robust Estimator We show the doubly robust property of the DR estimator for value function in Eq. (5) in the main text, as follows.

dr ij, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Industry: Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Neural Information Processing SystemsJan-19-2025, 21:34:52 GMT

Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential.

meta-rl, recurrent hypernetwork, sample inefficiency, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Neural Information Processing SystemsJan-19-2025, 10:51:34 GMT

The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.

generalization, imaginary task, latent dynamic mixture, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Cherepanov, Egor, Kachaev, Nikita, Zholus, Artem, Kovalev, Alexey K., Panov, Aleksandr I.

arXiv.org Artificial IntelligenceDec-9-2024

The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL). In particular, memory is paramount for tasks that require the utilization of past information, adaptation to novel environments, and improved sample efficiency. However, the term "memory" encompasses a wide range of concepts, which, coupled with the lack of a unified methodology for validating an agent's memory, leads to erroneous judgments about agents' memory capabilities and prevents objective comparison with other memory-enhanced agents. This paper aims to streamline the concept of memory in RL by providing practical precise definitions of agent memory types, such as long-term versus short-term memory and declarative versus procedural memory, inspired by cognitive science. Using these definitions, we categorize different classes of agent memory, propose a robust experimental methodology for evaluating the memory capabilities of RL agents, and standardize evaluations. Furthermore, we empirically demonstrate the importance of adhering to the proposed methodology when evaluating different types of agent memory by conducting experiments with different RL agents and what its violation leads to. Reinforcement Learning (RL) effectively addresses various problems within the Markov Decision Process (MDP) framework, where agents make decisions based on immediately available information (Mnih et al., 2015; Badia et al., 2020). However, there are still challenges in applying RL to more complex tasks with partial observability. To successfully address such challenges, it is essential that an agent is able to efficiently store and process the history of its interactions with the environment (Ni et al., 2021). Sequence processing methods originally developed for natural language processing (NLP) can be effectively applied to these tasks because the history of interactions with the environment can be represented as a sequence (Hausknecht & Stone, 2015; Esslinger et al., 2022; Samsami et al., 2024). However, in many tasks, due to the complexity or noisiness of observations, the sparsity of events, the difficulty of designing the reward function, and the long duration of episodes, storing and retrieving important information becomes extremely challenging, and the need for memory mechanisms arises (Graves et al., 2016; Wayne et al., 2018; Goyal et al., 2022).

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.06531

Country:

Asia > Russia (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.69)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Neural Information Processing SystemsOct-9-2024, 19:44:17 GMT

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL via Demonstrations (EMRLD) that exploit this information---even if sub-optimal---to obtain guidance during training.

demonstration, enhanced meta reinforcement learning, sparse reward environment, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization

Wen, Lu, Zhang, Songan, Tseng, H. Eric, Singh, Baljeet, Filev, Dimitar, Peng, Huei

arXiv.org Artificial IntelligenceFeb-9-2023

Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. \textit{Probabilistic embeddings for actor-critic RL} (PEARL) is a leading approach for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicitly consider the safety of the prior policy when it is exposed to a new task for the first time. Safety is essential for many real-world applications, including field robots and Autonomous Vehicles (AVs). In this paper, we develop the PEARL PLUS (PEARL$^+$) algorithm, which optimizes the policy for both prior (pre-adaptation) safety and posterior (after-adaptation) performance. Building on top of PEARL, our proposed PEARL$^+$ algorithm introduces a prior regularization term in the reward function and a new Q-network for recovering the state-action value under prior context assumptions, to improve the robustness to task distribution shift and safety of the trained network exposed to a new task for the first time. The performance of PEARL$^+$ is validated by solving three safety-critical problems related to robots and AVs, including two MuJoCo benchmark problems. From the simulation experiments, we show that safety of the prior policy is significantly improved and more robust to task distribution shift compared to PEARL.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS47612.2022.9981621

2108.08448

Country: North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Filters

Collaborating Authors

meta-rl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Doubly Robust Augmented Transfer for Meta-Reinforcement Learning

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Doubly Robust Augmented Transfer for Meta-Reinforcement Learning

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization