The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Open in new window