Sharpness-Aware Minimization and the Edge of Stability

Oct-30-2023–arXiv.org Machine Learning

Sharpness-aware Minimization (SAM) [Foret et al., 2020] is a new gradient-based neural network training algorithm that advanced the state-of-the-art test accuracy on a number of prominent benchmark datasets. As its name suggests, it explicitly seeks to find a solution that not only fits the training data, but that avoids "sharp" minima, for which nearby parameter vectors perform poorly. SAM is an incremental algorithm that updates its parameters using a gradient computed at a neighbor of the current solution. The neighbor is the point in parameter space found by taking a step of length ρ "uphill" in the gradient direction. The practical success of SAM has motivated theoretical research [Bartlett et al., 2022, Wen et al., 2023, Andriushchenko et al., 2023], including results highlighting senses in which SAM's update may be viewed, under certain conditions, as including a component that performs gradient descent on the operator norm of the Hessian [Bartlett et al., 2022, Wen et al., 2023]. Meanwhile, Cohen et al. [2021], building on the work of Jastrzebski et al. [2020] and others, exposed a striking phenomenon regarding neural network training with the original gradient descent (GD) method: for many initialization schemes and learning rates η, the operator norm of the Hessian eventually settles in the neighborhood of 2/η. This has been termed the "edge of stability", in part because a convex quadratic trained by gradient descent with a learning rate η will only Also affiliated with University of California, Berkeley.

artificial intelligence, machine learning, stability, (16 more...)

arXiv.org Machine Learning

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Alameda County > Berkeley (0.24)
  - Santa Clara County > Mountain View (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.75)
  - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found