Review for NeurIPS paper: Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

Jan-21-2025, 20:43:21 GMT–Neural Information Processing Systems

Weaknesses: - One weakness of the successor representation is that it is policy-dependent. So, in the control setting, it would need to be relearned whenever the policy is modified. On the other hand, perhaps one-step models would not suffer from this problem (since they are conditioned on actions too). Could you comment on this issue? So, it would seem like, when the model outputs a prediction, the agent would not know how far into the future this state is---it could be the very next state or far into the future.

generative temporal difference learning, infinite-horizon prediction, neurips paper, (3 more...)

Neural Information Processing Systems

Jan-21-2025, 20:43:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)