Review for NeurIPS paper: Value-driven Hindsight Modelling
–Neural Information Processing Systems
Learning value functions is a central theme in reinforcement learning. It is a hard problem because of the non-stationary nature of bootstrapping. This paper proposes a fresh approach for improving the learning of value functions by conditioning them on some information of the future states at training time (hindsight). Conditioning on the right future data should provide more certainty about the future return. All the reviewers liked the premise of the paper, clear motivation, and thorough experiments.
Neural Information Processing Systems
Jan-26-2025, 15:23:02 GMT
- Technology: