Review for NeurIPS paper: Value-driven Hindsight Modelling

Neural Information Processing Systems 

Learning value functions is a central theme in reinforcement learning. It is a hard problem because of the non-stationary nature of bootstrapping. This paper proposes a fresh approach for improving the learning of value functions by conditioning them on some information of the future states at training time (hindsight). Conditioning on the right future data should provide more certainty about the future return. All the reviewers liked the premise of the paper, clear motivation, and thorough experiments.