On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

May-28-2025, 13:08:40 GMT–Neural Information Processing Systems

In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a O (1 / T) rate in high probability under this general noise model where T denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

May-28-2025, 13:08:40 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > Experimental Study (0.92)

Industry:
- Education > Educational Setting > Online (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (1.00)
  - Representation & Reasoning > Optimization (0.67)