Review for NeurIPS paper: Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
–Neural Information Processing Systems
Weaknesses: - One weakness of the successor representation is that it is policy-dependent. So, in the control setting, it would need to be relearned whenever the policy is modified. On the other hand, perhaps one-step models would not suffer from this problem (since they are conditioned on actions too). Could you comment on this issue? So, it would seem like, when the model outputs a prediction, the agent would not know how far into the future this state is---it could be the very next state or far into the future.
Neural Information Processing Systems
Jan-21-2025, 20:43:21 GMT
- Technology: