Reviews: Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Neural Information Processing Systems 

The paper proposes to use episodic backwards updates to improve data efficiency in RL tasks, furthermore they introduce a soft relaxation of this in order to combat the overestimation that typically comes from using backwards updates when using Neural Network models. Overall the paper is very clearly written. My main concerns with the paper are in the experimental details as well as in the literature review, also when taking into account the existing literature the novelty of the work is quite limited. The idea of using backwards updates is quite old and goes back to at least the 1993 paper "Prioritized Sweeping" by Moore and Atkeson, which in fact demonstrates a method that is very similar to what the authors propose and which the authors fail to cite. Furthermore recently there were quite a few papers operating in a similar space of ideas using a backward view in ways similar to the authors, e.g.: Fast deep reinforcement learning using online adjustments from the past, https://arxiv.org/abs/1810.08163