AITopics | meta-rl algorithm

Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes \textit{Model-based Adversarial Meta-Reinforcement Learning} (AdMRL), where we aim to minimize the worst-case sub-optimality gap --- the difference between the optimal return and the return that the algorithm achieves after adaptation --- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the \textit{adversarial} task for the current model --- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms.

electronic proceedings, model-based adversarial meta-reinforcement learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Neural Information Processing SystemsOct-2-2025, 11:56:08 GMT

Reinforcement learning (RL) has seen tremendous success applied to challenging games (Mnih et al.,

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)

Add feedback

Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation

Zhang, Yang, Yan, Huiwen, Liu, Mushuang

arXiv.org Artificial IntelligenceOct-2-2025

Model-Agnostic Meta-Learning (MAML) is a versatile meta-learning framework applicable to both supervised learning and reinforcement learning (RL). However, applying MAML to meta-reinforcement learning (meta-RL) presents notable challenges. First, MAML relies on second-order gradient computations, leading to significant computational and memory overhead. Second, the nested structure of optimization increases the problem's complexity, making convergence to a global optimum more challenging. To overcome these limitations, we propose Directed-MAML, a novel task-directed meta-RL algorithm. Before the second-order gradient step, Directed-MAML applies an additional first-order task-directed approximation to estimate the effect of second-order gradients, thereby accelerating convergence to the optimum and reducing computational cost. Experimental results demonstrate that Directed-MAML surpasses MAML-based baselines in computational efficiency and convergence speed in the scenarios of CartPole-v1, LunarLander-v2 and two-vehicle intersection crossing. Furthermore, we show that task-directed approximation can be effectively integrated into other meta-learning algorithms, such as First-Order Model-Agnostic Meta-Learning (FOMAML) and Meta Stochastic Gradient Descent(Meta-SGD), yielding improved computational efficiency and convergence speed.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.00212

Country:

North America > United States > Missouri > Boone County > Columbia (0.14)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Guided Meta-Policy Search

Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn

Neural Information Processing SystemsAug-20-2025, 03:52:38 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, demonstration, learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Information-theoretic Task Selection for Meta-Reinforcement Learning

Neural Information Processing SystemsAug-17-2025, 03:44:58 GMT

A common framework consists in modeling the range of tasks the agent may encounter as a distribution over all possible tasks.

machine learning, reinforcement learning, training task, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
North America > United States (0.04)
North America > Canada (0.04)

Industry: Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 03:44:45 GMT

Paper ID 10791Title: Information-Theoretic T ask Selection for Meta-Reinforcement LearningWe thank all the reviewers for their thoughtful feedback. Our response can be found below, organized by review.R1 "It is not yet clear how results on such simple "toy" tasks will, if ever, generalize to practically important task distributions. But this current limitation does and should not stop progress towards such seminal contributions."Thank We agree that scalability to more complex settings is challenging (more on this in response to Reviewer 3), but this is a challenge for all of meta-RL. We introduce a method that identifies a clear gap in the literature, and that provides a first solution to the problem, which performs reliably well in a number of current meta-RL benchmarks.

artificial intelligence, machine learning, training task, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Model-based Adversarial Meta-Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 12:58:35 GMT

Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes \textit{Model-based Adversarial Meta-Reinforcement Learning} (AdMRL), where we aim to minimize the worst-case sub-optimality gap --- the difference between the optimal return and the return that the algorithm achieves after adaptation --- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the \textit{adversarial} task for the current model --- the task for which the policy induced by the model is maximally suboptimal.

meta-rl algorithm, model-based adversarial meta-reinforcement learning, task distribution, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Collaborating Authors

meta-rl algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ec3183a7f107d1b8dbb90cb3c01ea7d5-Paper.pdf

ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf

1454ca2270599546dfcd2a3700e4d2f1-Paper.pdf

Model-based Adversarial Meta-Reinforcement Learning

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation

Guided Meta-Policy Search

Information-theoretic Task Selection for Meta-Reinforcement Learning

ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf

Model-based Adversarial Meta-Reinforcement Learning