High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise

Nov-3-2023–arXiv.org Machine Learning

In this paper, we study the convergence of the Adaptive Moment E stimation (Adam) algorithm under unconstrained non-convex smooth stochastic op timizations. Despite the widespread usage in machine learning areas, its theoretical proper ties remain limited. Prior researches primarily investigated Adam's convergence from an exp ectation view, often necessitating strong assumptions like uniformly stochastic bounded g radients or problem-dependent knowledge in prior. As a result, the applicability of these fi ndings in practical real-world scenarios has been constrained. To overcome these limit ations, we provide a deep analysis and show that Adam could converge to the stationary point in high probability with a rate of O null poly(log T)/ T null under coordinate-wise "affine" variance noise, not requiring any bounded gradient assumption and any problem-depen dent knowledge in prior to tune hyper-parameters. Additionally, it is revealed that Adam co nfines its gradients' magnitudes within an order of O (poly(log T)). Finally, we also investigate a simplified version of Adam without one of the corrective terms and obtain a co nvergence rate that is adaptive to the noise level.

artificial intelligence, machine learning, null, (11 more...)

arXiv.org Machine Learning

Nov-3-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found