The alignment property of noise and how it helps select flat minima A stability analysis

Neural Information Processing Systems 

The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its linear stability (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found