Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Xiao, Nachuan, Hu, Xiaoyin, Liu, Xin, Toh, Kim-Chuan

May-6-2023–arXiv.org Artificial Intelligence

The optimization problem in the form of UNP has numerous important applications in machine learning and data science, especially in training deep neural networks. In these applications of UNP, we usually only have access to the stochastic evaluations of the exact gradients of f. The stochastic gradient descent (SGD) is one of the most popular methods for solving UNP, and incorporating the momentum terms to SGD for acceleration is also very popular in practice. In SGD, the updating rule depends on the stepsizes (i.e., learning rates), where all of the coordinates of the variable x are equipped with the same stepsize. Recently, a variety of accelerated versions for SGD are proposed. In particular, the widely used Adam algorithm (Kingma and Ba, 2015) is developed based on the adaptive adjustment of the coordinate-wise stepsizes and the incorporation of momentum terms in each iteration. These enhancements have led to its high efficiency in practice. Motivated by Adam, a number of efficient Adam-family methods are developed, such as AdaBelief (Zhuang et al., 2020), AMSGrad (Reddi et al., 2018), NAdam (Dozat), Yogi (Zaheer et al., 2018), etc. Towards the convergence properties of these Adam-family methods, Kingma and Ba (2015) shows the convergence properties for Adam with constant stepsize and global Lipschitz objective function f.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Artificial Intelligence

May-6-2023

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Beijing > Beijing (0.04)
    - Zhejiang Province > Hangzhou (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Singapore (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.50)
  - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found