The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

Bartlett, Peter L., Long, Philip M., Bousquet, Olivier

Apr-11-2023–arXiv.org Artificial Intelligence

The broad practical impact of deep learning has heightened interest in many of its surprising characteristics: simple gradient methods applied to deep neural networks seem to efficiently optimize nonconvex criteria, reliably giving a near-perfect fit to training data, but exhibiting good predictive accuracy nonetheless [see Bartlett et al., 2021]. Optimization methodology is widely believed to affect statistical performance by imposing some kind of implicit regularization, and there has been considerable effort devoted to understanding the behavior of optimization methods and the nature of solutions that they find. For instance, Barrett and Dherin [2020] and Smith et al. [2021] show that discrete-time gradient descent and stochastic gradient descent can be viewed as gradient flow methods applied to penalized losses that encourage smoothness, and Soudry et al. [2018] amd Azulay et al. [2021] identify the implicit regularization imposed by gradient flow in specific examples, including linear networks. We consider Sharpness-Aware Minimization (SAM), a recently introduced [Foret et al., 2021] gradient optimization method that has exhibited substantial improvements in prediction performance for deep networks applied to image classification [Foret et al., 2021] and NLP [Bahri et al., 2022] problems. Also affiliated with University of California, Berkeley.

artificial intelligence, gradient descent, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Apr-11-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Alameda County > Berkeley (0.24)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.54)
    - Statistical Learning > Gradient Descent (0.75)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found