pchid
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (2 more...)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (2 more...)
To Review 1: 2 Q1: The connection between the policy and the Hindsight Inverse Dynamics(HID). Instead of mapping (s
We thank all reviewers for their insightful comments. Please see the responses below. Q2: Why is it important to relabel data to learn HID? And multistep HIDs help such extrapolations in non-trivial cases. And Fig.1(b) below shows similar results in For most goal-oriented tasks, the learning objective is to find a policy to reach the goal as soon as possible.
Policy Continuation with Hindsight Inverse Dynamics
Sun, Hao, Li, Zhizhong, Liu, Xiaotong, Lin, Dahua, Zhou, Bolei
Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- North America > Canada (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (2 more...)