Label Noise SGD Provably Prefers Flat Global Minimizers

Neural Information Processing Systems 

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found