High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise

Hong, Yusu, Lin, Junhong

arXiv.org Machine Learning 

In this paper, we study the convergence of the Adaptive Moment E stimation (Adam) algorithm under unconstrained non-convex smooth stochastic op timizations. Despite the widespread usage in machine learning areas, its theoretical proper ties remain limited. Prior researches primarily investigated Adam's convergence from an exp ectation view, often necessitating strong assumptions like uniformly stochastic bounded g radients or problem-dependent knowledge in prior. As a result, the applicability of these fi ndings in practical real-world scenarios has been constrained. To overcome these limit ations, we provide a deep analysis and show that Adam could converge to the stationary point in high probability with a rate of O null poly(log T)/ T null under coordinate-wise "affine" variance noise, not requiring any bounded gradient assumption and any problem-depen dent knowledge in prior to tune hyper-parameters. Additionally, it is revealed that Adam co nfines its gradients' magnitudes within an order of O (poly(log T)). Finally, we also investigate a simplified version of Adam without one of the corrective terms and obtain a co nvergence rate that is adaptive to the noise level.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found