Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees
Xiao, Nachuan, Hu, Xiaoyin, Liu, Xin, Toh, Kim-Chuan
–arXiv.org Artificial Intelligence
The optimization problem in the form of UNP has numerous important applications in machine learning and data science, especially in training deep neural networks. In these applications of UNP, we usually only have access to the stochastic evaluations of the exact gradients of f. The stochastic gradient descent (SGD) is one of the most popular methods for solving UNP, and incorporating the momentum terms to SGD for acceleration is also very popular in practice. In SGD, the updating rule depends on the stepsizes (i.e., learning rates), where all of the coordinates of the variable x are equipped with the same stepsize. Recently, a variety of accelerated versions for SGD are proposed. In particular, the widely used Adam algorithm (Kingma and Ba, 2015) is developed based on the adaptive adjustment of the coordinate-wise stepsizes and the incorporation of momentum terms in each iteration. These enhancements have led to its high efficiency in practice. Motivated by Adam, a number of efficient Adam-family methods are developed, such as AdaBelief (Zhuang et al., 2020), AMSGrad (Reddi et al., 2018), NAdam (Dozat), Yogi (Zaheer et al., 2018), etc. Towards the convergence properties of these Adam-family methods, Kingma and Ba (2015) shows the convergence properties for Adam with constant stepsize and global Lipschitz objective function f.
arXiv.org Artificial Intelligence
May-6-2023
- Country:
- Asia
- China
- Beijing > Beijing (0.04)
- Zhejiang Province > Hangzhou (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Singapore (0.04)
- China
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Education (1.00)
- Technology: