Goal-Conditioned On-Policy Reinforcement Learning Xudong Gong

Neural Information Processing Systems 

This limitation prevents HER from densifying the reward.