Policy Continuation with Hindsight Inverse Dynamics

Sun, Hao, Li, Zhizhong, Liu, Xiaotong, Zhou, Bolei, Lin, Dahua

Mar-19-2020, 00:47:59 GMT–Neural Information Processing Systems

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay. Enabling the learning process in a self-imitated manner and thus can be trained with supervised learning.

hindsight inverse dynamic, policy continuation

Neural Information Processing Systems

Mar-19-2020, 00:47:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)