Review for NeurIPS paper: Learning Guidance Rewards with Trajectory-space Smoothing

Neural Information Processing Systems 

Weaknesses: While the experimental results are promising in the episodic Mujoco tasks, it is unclear whether IRCR works in a more complicated setting (e.g., hard exploration games in Atari, sparse reward tasks, robotic arm pick-and-place tasks, etc). Specifically, in the Mujoco tasks considered in the paper, the episodic reward can be obtained by performing simple, repetitive patterns (i.e., keep going forward). But often, the solution to an environment may involve action sequences that are hardly overlapped or repetitive (e.g., a robotic agent that needs to pick up a block and place it in a specified location, a maze-exploring agent that needs to pick up a key to open the exit). In such cases, the algorithm needs to correctly associate the non-repetitive events with the final return to solve the problem. I wonder if IRCR works in such cases as well.