The Implicit Bias of Minima Stability: A View from Function Space

Neural Information Processing Systems 

The loss terrains of over-parameterized neural networks have multiple global minima. However, it is well known that stochastic gradient descent (SGD) can stably converge only to minima that are sufficiently flat w.r.t.