Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

Tao, Yuanzhe, Yuan, Huizhuo, Zhou, Xun, Cao, Yuan, Gu, Quanquan

Dec-26-2024–arXiv.org Machine Learning

In recent years, optimization algorithms such as AdaGrad (Duchi et al., 2011) and Adam (Kingma, 2014) have emerged as powerful tools for enhancing the training of deep learning models by efficiently adapting the learning rate during the optimization process. While these algorithms have demonstrated remarkable performance gains in various applications, a notable drawback lies in the necessity of manual tuning for suitable learning rates. The process of learning rate tuning can be laborious and often requires extensive trial and error, hindering the efficiency and scalability of deep learning model development. The intricate nature of learning rate tuning has motivated a large number of recent works to develop "learning-rate-free" or "parameter-free" algorithms that can work well under various different settings without learning rate tuning. Among the vast literature of parameter-free optimization methods, Ivgi et al. (2023) proposed a framework called distance over gradients (DoG), which gives a parameter-free version of stochastic gradient descent (SGD) that shares certain features as the AdaGrad-Norm algorithm (Streeter and McMahan, 2010; Ward et al., 2020).

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

Dec-26-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found