Reviews: Exploration via Hindsight Goal Generation
–Neural Information Processing Systems
The authors propose a new method for sampling exploration goals when performing goal-conditioned RL with hindsight experience replay. The authors propose a lower bound that depends on some Lipschitz property of the goal-conditioned value function with respect to the distance between the goals and states. The authors demonstrate that across various Fetch-robot tasks, their method, when combined with EBP (a method for relabeling goals), outperforms HER. The authors also perform various ablations that show their method is relatively insensitive to hyperparameter values. Overall, the empirical results are solid, but the math behind the paper is rather troubling.
Neural Information Processing Systems
Jan-23-2025, 22:02:49 GMT
- Technology: