Goto

Collaborating Authors

 Oceania








WhenDoFlatMinimaOptimizers Work?

Neural Information Processing Systems

Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.