meta-rl algorithm
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
Model-based Adversarial Meta-Reinforcement Learning
Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes \textit{Model-based Adversarial Meta-Reinforcement Learning} (AdMRL), where we aim to minimize the worst-case sub-optimality gap --- the difference between the optimal return and the return that the algorithm achieves after adaptation --- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the \textit{adversarial} task for the current model --- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms.
Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation
Zhang, Yang, Yan, Huiwen, Liu, Mushuang
Model-Agnostic Meta-Learning (MAML) is a versatile meta-learning framework applicable to both supervised learning and reinforcement learning (RL). However, applying MAML to meta-reinforcement learning (meta-RL) presents notable challenges. First, MAML relies on second-order gradient computations, leading to significant computational and memory overhead. Second, the nested structure of optimization increases the problem's complexity, making convergence to a global optimum more challenging. To overcome these limitations, we propose Directed-MAML, a novel task-directed meta-RL algorithm. Before the second-order gradient step, Directed-MAML applies an additional first-order task-directed approximation to estimate the effect of second-order gradients, thereby accelerating convergence to the optimum and reducing computational cost. Experimental results demonstrate that Directed-MAML surpasses MAML-based baselines in computational efficiency and convergence speed in the scenarios of CartPole-v1, LunarLander-v2 and two-vehicle intersection crossing. Furthermore, we show that task-directed approximation can be effectively integrated into other meta-learning algorithms, such as First-Order Model-Agnostic Meta-Learning (FOMAML) and Meta Stochastic Gradient Descent(Meta-SGD), yielding improved computational efficiency and convergence speed.
- North America > United States > Missouri > Boone County > Columbia (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
ec3183a7f107d1b8dbb90cb3c01ea7d5-AuthorFeedback.pdf
Paper ID 10791Title: Information-Theoretic T ask Selection for Meta-Reinforcement LearningWe thank all the reviewers for their thoughtful feedback. Our response can be found below, organized by review.R1 "It is not yet clear how results on such simple "toy" tasks will, if ever, generalize to practically important task distributions. But this current limitation does and should not stop progress towards such seminal contributions."Thank We agree that scalability to more complex settings is challenging (more on this in response to Reviewer 3), but this is a challenge for all of meta-RL. We introduce a method that identifies a clear gap in the literature, and that provides a first solution to the problem, which performs reliably well in a number of current meta-RL benchmarks.
Model-based Adversarial Meta-Reinforcement Learning
Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes \textit{Model-based Adversarial Meta-Reinforcement Learning} (AdMRL), where we aim to minimize the worst-case sub-optimality gap --- the difference between the optimal return and the return that the algorithm achieves after adaptation --- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the \textit{adversarial} task for the current model --- the task for which the policy induced by the model is maximally suboptimal.