Review for NeurIPS paper: Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

Neural Information Processing Systems 

Weaknesses: - One weakness of the successor representation is that it is policy-dependent. So, in the control setting, it would need to be relearned whenever the policy is modified. On the other hand, perhaps one-step models would not suffer from this problem (since they are conditioned on actions too). Could you comment on this issue? So, it would seem like, when the model outputs a prediction, the agent would not know how far into the future this state is---it could be the very next state or far into the future.