Sharpness-Aware Minimization and the Edge of Stability
Long, Philip M., Bartlett, Peter L.
Sharpness-aware Minimization (SAM) [Foret et al., 2020] is a new gradient-based neural network training algorithm that advanced the state-of-the-art test accuracy on a number of prominent benchmark datasets. As its name suggests, it explicitly seeks to find a solution that not only fits the training data, but that avoids "sharp" minima, for which nearby parameter vectors perform poorly. SAM is an incremental algorithm that updates its parameters using a gradient computed at a neighbor of the current solution. The neighbor is the point in parameter space found by taking a step of length ρ "uphill" in the gradient direction. The practical success of SAM has motivated theoretical research [Bartlett et al., 2022, Wen et al., 2023, Andriushchenko et al., 2023], including results highlighting senses in which SAM's update may be viewed, under certain conditions, as including a component that performs gradient descent on the operator norm of the Hessian [Bartlett et al., 2022, Wen et al., 2023]. Meanwhile, Cohen et al. [2021], building on the work of Jastrzebski et al. [2020] and others, exposed a striking phenomenon regarding neural network training with the original gradient descent (GD) method: for many initialization schemes and learning rates η, the operator norm of the Hessian eventually settles in the neighborhood of 2/η. This has been termed the "edge of stability", in part because a convex quadratic trained by gradient descent with a learning rate η will only Also affiliated with University of California, Berkeley.
Oct-30-2023
- Country:
- North America > United States > California
- Alameda County > Berkeley (0.24)
- Santa Clara County > Mountain View (0.04)
- North America > United States > California
- Genre:
- Research Report (0.50)
- Technology: