Review for NeurIPS paper: Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems 

Weaknesses: - The main weakness of the paper in my opinion is the lack of theoretical rigor to justify some of the claims as well as the language that is often imprecise. For example: - The description of the method in line 55-56 is misleading in that it indicates that the original trajectory with the originally intended task is not used and it is relabeled instead. Later in the paper, in Section 3 and in the algorithm box, the authors explain that they use the original task as well as the relabeled one. In the extreme case, we could imagine a situation where there is a set of successful trajectories for one task (that was potentially collected with another task in mind). In this case, the authors' algorithm would always pick the successful trajectories even though we know that informative negatives are crucial for off-policy RL algorithms.