Reviews: Curriculum-guided Hindsight Experience Replay

Neural Information Processing Systems 

The paper borrows tools from combinatorial optimization (i.e. for the facility location problem) in order to select hindsight goals that simultaneously has high diversity and also being close to the desired goals. As mentioned, the similarity metric used for the proximity term seems to require domain knowledge that euclidean distance works well for this task. This may be problematic if we have obstacles that mislead the euclidean distance, or in another environment where it is less obvious what the similarity metric can be. I am aware that this dense similarity metric is only used for hindsight goals, and that the underlying Q function/policy is still trained on the sparse reward (without the bias). There are several related works that can be discussed and potentially benchmarked against in terms of hindsight goal sampling schemes: Sampling from ground truth distribution half the time for relabeling, and using future the other time (in Appendix).