AITopics | warmup schedule

Collaborating Authors

warmup schedule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence

Liu, Yuxing, Ge, Yuze, Pan, Rui, Kang, An, Zhang, Tong

arXiv.org Artificial IntelligenceSep-10-2025

Learning rate warmup is a popular and practical technique in training large-scale deep neural networks. Despite the huge success in practice, the theoretical advantages of this strategy of gradually increasing the learning rate at the beginning of the training process have not been fully understood. To resolve this gap between theory and practice, we first propose a novel family of generalized smoothness assumptions, and validate its applicability both theoretically and empirically. Under the novel smoothness assumption, we study the convergence properties of gradient descent (GD) in both deterministic and stochastic settings. It is shown that learning rate warmup consistently accelerates GD, and GD with warmup can converge at most $Θ(T)$ times faster than with a non-increasing learning rate schedule in some specific cases, providing insights into the benefits of this strategy from an optimization theory perspective.

artificial intelligence, learning rate, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.07972

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Fast Certified Robust Training via Better Initialization and Shorter Warmup

Shi, Zhouxing, Wang, Yihan, Zhang, Huan, Yi, Jinfeng, Hsieh, Cho-Jui

arXiv.org Artificial IntelligenceApr-1-2021

Recently, bound propagation based certified adversarial defense have been proposed for training neural networks with certifiable robustness guarantees. Despite state-of-the-art (SOTA) methods including interval bound propagation (IBP) and CROWN-IBP have per-batch training complexity similar to standard neural network training, to reach SOTA performance they usually need a long warmup schedule with hundreds or thousands epochs and are thus still quite costly for training. In this paper, we discover that the weight initialization adopted by prior works, such as Xavier or orthogonal initialization, which was originally designed for standard network training, results in very loose certified bounds at initialization thus a longer warmup schedule must be used. We also find that IBP based training leads to a significant imbalance in ReLU activation states, which can hamper model performance. Based on our findings, we derive a new IBP initialization as well as principled regularizers during the warmup stage to stabilize certified bounds during initialization and warmup stage, which can significantly reduce the warmup schedule and improve the balance of ReLU activation states. Additionally, we find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training, because it helps stabilize bound variance and balance ReLU activation states. With our proposed initialization, regularizers and architectural changes combined, we are able to obtain 65.03% verified error on CIFAR-10 ($\epsilon=\frac{8}{255}$) and 82.13% verified error on TinyImageNet ($\epsilon=\frac{1}{255}$) using very short training schedules (160 and 80 total epochs, respectively), outperforming literature SOTA trained with a few hundreds or thousands epochs.

epoch, initialization, regularizer, (14 more...)

arXiv.org Artificial Intelligence

2103.17268

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Sardinia (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

On the adequacy of untuned warmup for adaptive optimization

Ma, Jerry, Yarats, Denis

arXiv.org Machine LearningOct-9-2019

Adaptive optimization algorithms such as Adam (Kingma & Ba, 2014) are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, Liu et al. (2019) propose automatic variance rectification of Adam's adaptive learning rate, claiming that this rectified approach ("RAdam") surpasses the vanilla Adam algorithm and reduces the need for expensive tuning of Adam with warmup. In this work, we point out various shortcomings of this analysis. We then provide an alternative explanation for the necessity of warmup based on the magnitude of the update term, which is of greater relevance to training stability. Finally, we provide some "rule-of-thumb" warmup schedules, and we demonstrate that simple untuned warmup of Adam performs more-or-less identically to RAdam in typical practical settings. We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over $2 / (1 - \beta_2)$ training iterations.

iteration, warmup, warmup schedule, (17 more...)

arXiv.org Machine Learning

1910.04209

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback