Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing Systems 

The paper is very theoretically-grounded, with plenty of explanation of intuition and proof of the approximations used. The significance of the contribution is large. Most RL algorithms are exactly the ADP family that this proposes to modify, and the addition of this corrective feedback model can be slotted into most training loops without compatibility issues. As the authors note, it could also be used to guide exploration rather than just for post hoc transition correction. This is clearly relevant to the NeurIPS community, much of which makes use of this form of RL algorithm.