On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
–Neural Information Processing Systems
In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a O (1 / T) rate in high probability under this general noise model where T denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors.
Neural Information Processing Systems
May-28-2025, 13:08:40 GMT
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Education > Educational Setting > Online (0.67)
- Technology: