Goto

Collaborating Authors

 Africa





Function Independent

Neural Information Processing Systems

Let t beanother F analytic in t suchthat 0 = f0 and ( t) =( ft) forallt. Intherightfigurea volumepreservingtransfor applied (see Appendix C). 3: Smoothinvariantdeformations.






WhenDoFlatMinimaOptimizers Work?

Neural Information Processing Systems

Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.