Review for NeurIPS paper: Understanding Deep Architecture with Reasoning Layer

Neural Information Processing Systems 

Strengths: The most interesting aspect of this paper is the abstraction with which such a large class of potential hybrid models is dealt with. The problem setting is general enough that it's not easy to come up with architectures that would not fit this scheme. While the results part of the paper starts by revisiting some well known results on convergence of gradient descent and Nesterov's method, the study of the sensitivity to perturbations of the two algorithms seems novel. The main interesting results come in Sections 4 and 5, where the authors present first results showing that faster convergence leads to eventual better approximation. I have found the Theorem 5.1 and the corresponding theorems in the Supp.