WhenDoFlatMinimaOptimizers Work?

Neural Information Processing Systems 

Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found