Review for NeurIPS paper: Provably Efficient Neural GTD for Off-Policy Learning

Neural Information Processing Systems 

Weaknesses: The philosophy of establishing convergence guarantees for neural networks under specific numbers of neurons is strange, because the number of neurons is a very coarse description of a network that can already be established by nonparametric estimators, i.e., Cho, Youngmin, and Lawrence K. Saul. And numerous follow up works. Therefore, if the neural network analysis is to refine this approach, then it must also specify the *inter-layer* relationships and broader architectural choices to actually be useful to practitioners. As is, I don't see how the m of Lemma 4.1 can actually be used to inform choice of a neural architecture in any sharper manner than, e.g., a single layer RBF network. Also, reformulating Bellman's equations into saddle point problems has been previously studied: Shapiro, A. (2011).