Review for NeurIPS paper: Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

Jan-21-2025, 20:43:14 GMT–Neural Information Processing Systems

Summary: this paper proposes a new model-based RL algorithm, where instead of learning state transition probabilities, the occupancy distribution for an infinite horizon is learned. This method can be seen as an extension of the method known as the successor representation to continuous state-action spaces and to infinite horizons. The occupancy distribution is modeled as an energy function, and learned with temporal differences (TD), using a GAN. The experiments on a few MuJuCo problems clearly show the advantages of the proposed approach compared to RL algorithms such as PPO and SAC. The reviewers agree that the proposed method is new, interesting, and validated by the simulation experiments.

generative temporal difference learning, infinite-horizon prediction, occupancy distribution, (6 more...)

Neural Information Processing Systems

Jan-21-2025, 20:43:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)