On the Generalization Benefit of Noise in Stochastic Gradient Descent

Open in new window