On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies

Wang, Haozhi, Wang, Qing, Shao, Yunfeng, Li, Dong, Hao, Jianye, Li, Yinchuan

Sep-20-2022–arXiv.org Artificial Intelligence

Modern meta-reinforcement learning (Meta-RL) methods are mainly developed based on model-agnostic meta-learning, which performs policy gradient steps across tasks to maximize policy performance. However, the gradient conflict problem is still poorly understood in Meta-RL, which may lead to performance degradation when encountering distinct tasks. To tackle this challenge, this paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm, which aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task under the constraint of the meta-policy. We also provide the theoretical analysis under the tabular setting, which demonstrates the convergence of our pMeta-RL algorithm. Moreover, we extend the proposed pMeta-RL algorithm to a deep network version based on soft actor-critic, making it suitable for continuous control tasks. Experiment results show that the proposed algorithms outperform other previous Meta-RL algorithms on Gym and MuJoCo suites.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Sep-20-2022

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Jordan (0.04)
  - China
    - Ningxia Hui Autonomous Region > Yinchuan (0.04)
    - Tianjin Province > Tianjin (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found