Review for NeurIPS paper: Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing Systems 

Additional Feedback: ### On my overall decision I am willing to largely upgrade my decision, if the authors can provide strong evidence that's easy to check (i.e. "safety checks") to support the correctness of their propositions/theorems. But since the size m of the hidden layers becomes infinite, the set of weights tends to a fixed limiting distribution: the same for all layers. Therefore, when m goes to infinity, the time-varying component gets smoothed out. So, when L now becomes infinite, we exactly recover an unrolled, 1-layer recurrent neural network. By Representer theorem - By the representer theorem - Fig.2, caption: CIFAR102 - CIFAR2 Reply to author response ----------------------------- Thank you for the additional plots provided in your response, which indeed nicely confirm your main theorems.