Policy Continuation with Hindsight Inverse Dynamics

Sun, Hao, Li, Zhizhong, Liu, Xiaotong, Lin, Dahua, Zhou, Bolei

Nov-1-2019–arXiv.org Machine Learning

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.

hindsight inverse dynamic, learning, pchid, (11 more...)

arXiv.org Machine Learning

Nov-1-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.04)
- Europe > France
  - Hauts-de-France > Nord > Lille (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)
  - Vietnam > Hanoi
    - Hanoi (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found