Review for NeurIPS paper: Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
–Neural Information Processing Systems
Summary: this paper proposes a new model-based RL algorithm, where instead of learning state transition probabilities, the occupancy distribution for an infinite horizon is learned. This method can be seen as an extension of the method known as the successor representation to continuous state-action spaces and to infinite horizons. The occupancy distribution is modeled as an energy function, and learned with temporal differences (TD), using a GAN. The experiments on a few MuJuCo problems clearly show the advantages of the proposed approach compared to RL algorithms such as PPO and SAC. The reviewers agree that the proposed method is new, interesting, and validated by the simulation experiments.
Neural Information Processing Systems
Jan-21-2025, 20:43:14 GMT
- Technology: