Reviews: Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing Systems 

Originality: The paper relies on recent results on the implicit local linearization effect of overparametrized neural networks in the context of supervised learning, and on recent nonasymptotic analysis of Linear TD and Linear Q-learning. Perhaps the main insight is the relationship between the explicit linearization of Linear TD and the implicit linearization of overparametrized neural TD. Related work is properly referenced. Quality: The paper seems to be technically sound (although I have just skimmed over the proofs). The convergence of the three algorithms, namely Neural TD, Neural Q-learning, and Neural Soft Q-learning constitute a complete piece of work.