An SDE for Modeling SAM: Theory and Insights
Compagnoni, Enea Monzio, Biggio, Luca, Orvieto, Antonio, Proske, Frank Norbert, Kersting, Hans, Lucchi, Aurelien
–arXiv.org Artificial Intelligence
We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.
arXiv.org Artificial Intelligence
Jun-4-2023
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- France (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Switzerland
- Basel-City > Basel (0.04)
- Zürich > Zürich (0.14)
- North America > United States
- Hawaii > Honolulu County > Honolulu (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.40)
- Technology: