To Review 1: 2 Q1: The connection between the policy and the Hindsight Inverse Dynamics(HID). Instead of mapping (s

Neural Information Processing Systems 

We thank all reviewers for their insightful comments. Please see the responses below. Q2: Why is it important to relabel data to learn HID? And multistep HIDs help such extrapolations in non-trivial cases. And Fig.1(b) below shows similar results in For most goal-oriented tasks, the learning objective is to find a policy to reach the goal as soon as possible.