An SDE for Modeling SAM: Theory and Insights

Compagnoni, Enea Monzio, Biggio, Luca, Orvieto, Antonio, Proske, Frank Norbert, Kersting, Hans, Lucchi, Aurelien

Jun-4-2023–arXiv.org Artificial Intelligence

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

artificial intelligence, machine learning, usam, (18 more...)

arXiv.org Artificial Intelligence

Jun-4-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Hawaii > Honolulu County > Honolulu (0.04)
- Europe
  - France (0.04)
  - Switzerland
    - Zürich > Zürich (0.14)
    - Basel-City > Basel (0.04)
  - Norway > Eastern Norway
    - Oslo (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning > Gradient Descent (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found