k-step solvability
To Review 1: 2 Q1: The connection between the policy and the Hindsight Inverse Dynamics(HID). Instead of mapping (s
We thank all reviewers for their insightful comments. Please see the responses below. Q2: Why is it important to relabel data to learn HID? And multistep HIDs help such extrapolations in non-trivial cases. And Fig.1(b) below shows similar results in For most goal-oriented tasks, the learning objective is to find a policy to reach the goal as soon as possible.