Reviews: Non-delusional Q-learning and value-iteration

Neural Information Processing Systems 

The paper defines a new type of reinforcement learning algorithm, which takes account of the imperfections of the function approximator and tries to obtain the best policy available given these imperfections rather than assuming no imperfections exist, thus avoiding pathologies arising when we assume a flawed approximate is perfect. The quality of this paper is really good. It introduces a new type of RL algorithm, which is clearly motivated and solid. The weaker points are: 1. The complexity of the defined algorithm seems too high for it to be immediately applicable to interesting problems.