FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizer by Strong Convexity

Zhou, Yangfan, Huang, Kaizhu, Cheng, Cheng, Wang, Xuguang, Liu, Xin

Apr-28-2021–arXiv.org Machine Learning

The AdaBelief algorithm demonstrates superior generalization ability to the Adam algorithm by viewing the exponential moving average of observed gradients. AdaBelief is proved to have a data-dependent $O(\sqrt{T})$ regret bound when objective functions are convex, where $T$ is a time horizon. However, it remains to be an open problem on how to exploit strong convexity to further improve the convergence rate of AdaBelief. To tackle this problem, we present a novel optimization algorithm under strong convexity, called FastAdaBelief. We prove that FastAdaBelief attains a data-dependant $O(\log T)$ regret bound, which is substantially lower than AdaBelief. In addition, the theoretical analysis is validated by extensive experiments performed on open datasets (i.e., CIFAR-10 and Penn Treebank) for image classification and language modeling.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

Apr-28-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China > Anhui Province (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found