Review for NeurIPS paper: Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Jan-22-2025, 07:34:10 GMT–Neural Information Processing Systems

Additional Feedback: ### On my overall decision I am willing to largely upgrade my decision, if the authors can provide strong evidence that's easy to check (i.e. "safety checks") to support the correctness of their propositions/theorems. But since the size m of the hidden layers becomes infinite, the set of weights tends to a fixed limiting distribution: the same for all layers. Therefore, when m goes to infinity, the time-varying component gets smoothed out. So, when L now becomes infinite, we exactly recover an unrolled, 1-layer recurrent neural network. By Representer theorem - By the representer theorem - Fig.2, caption: CIFAR102 - CIFAR2 Reply to author response ----------------------------- Thank you for the additional plots provided in your response, which indeed nicely confirm your main theorems.

deep feedforward network, neural tangent kernel perspective, theorem, (7 more...)

Neural Information Processing Systems

Jan-22-2025, 07:34:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)