A Rod Flow Model for Adam at the Edge of Stability

May-11-2026–arXiv.org Machine Learning

Neural networks are trained by minimizing loss functions with gradient-based optimizers. Cohen et al. [2021] observed that full-batch gradient descent operates at the edge of stability (EoS): the largest eigenvalue of the Hessian, called the sharpness, first rises (a phase called progressive sharpening) and then hovers at the stability threshold 2/η where η is the learning rate. Cohen et al. [2022] extended this picture to momentum methods and adaptive gradient methods, showing that each optimizer exhibits its own edge of stability. Rather than hovering at 2/η, the relevant quantity--the preconditioned sharpness--hovers at a hyperparameter-dependent threshold that depends on the optimizer (Table 2). In practice, the dominant optimizer in machine learning is Adam [Kingma and Ba, 2015], which differs from gradient descent in two respects.

artificial intelligence, equation, machine learning, (15 more...)

arXiv.org Machine Learning

May-11-2026

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found