Reviews: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing Systems 

This paper provides a generalization bound for training over-parameterized deep neural networks with ReLU activation and cross-entropy loss using SGD. Initially the paper received mixed reviews, with two positive and one negative reviews. On the one hand, the analysis is found to be intuitive, general, and potentially influential, the generalization bound is found to be more general and sharper than many existing generalization error bounds for over-parameterized neural networks, and the paper to be very well written. On the other, hand the width requirement is found to be too strict. The rebuttal addressed the issues raised by the reviewers, one rating was increased from 6 to 8, and the negative review updated the score to 6. Upon discussion, the reviewers agreed that the paper should be accepted.