AITopics | stepsize schedule

Collaborating Authors

stepsize schedule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Min-Max Optimization Is Strictly Easier Than Variational Inequalities

Shugart, Henry, Altschuler, Jason M.

arXiv.org Artificial IntelligenceNov-6-2025

Classically, a mainstream approach for solving a convex-concave min-max problem is to instead solve the variational inequality problem arising from its first-order optimality conditions. Is it possible to solve min-max problems faster by bypassing this reduction? This paper initiates this investigation. We show that the answer is yes in the textbook setting of unconstrained quadratic objectives: the optimal convergence rate for first-order algorithms is strictly better for min-max problems than for the corresponding variational inequalities. The key reason that min-max algorithms can be faster is that they can exploit the asymmetry of the min and max variables--a property that is lost in the reduction to variational inequalities. Central to our analyses are sharp characterizations of optimal convergence rates in terms of extremal polynomials which we compute using Green's functions and conformal mappings.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.03052

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Negative Stepsizes Make Gradient-Descent-Ascent Converge

Shugart, Henry, Altschuler, Jason M.

arXiv.org Artificial IntelligenceMay-5-2025

Efficient computation of min-max problems is a central question in optimization, learning, games, and controls. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). All of our results apply to the last iterate of GDA, as is typically desired in practice. The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).

artificial intelligence, machine learning, stepsize schedule, (18 more...)

arXiv.org Artificial Intelligence

2505.01423

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Anytime Acceleration of Gradient Descent

Zhang, Zihan, Lee, Jason D., Du, Simon S., Chen, Yuxin

arXiv.org Machine LearningDec-8-2024

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-\Omega(T/\kappa^{0.893}))$ for smooth and strongly convex optimization, with $\kappa$ being the condition number.

artificial intelligence, machine learning, stepsize schedule, (14 more...)

arXiv.org Machine Learning

2411.17668

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback

Open Problem: Anytime Convergence Rate of Gradient Descent

Kornowski, Guy, Shamir, Ohad

arXiv.org Artificial IntelligenceJun-19-2024

Recent results show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence. We show that this can lead to surprisingly large errors indefinitely, and therefore ask: Is there any stepsize schedule for gradient descent that accelerates the classic $\mathcal{O}(1/T)$ convergence rate, at \emph{any} stopping time $T$?

long step, stepsize schedule, stepsize sequence, (9 more...)

arXiv.org Artificial Intelligence

2406.13888

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Gürbüzbalaban, Mert, Hu, Yuanhan, Şimşekli, Umut, Zhu, Lingjiong

arXiv.org Artificial IntelligenceAug-29-2023

Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.05516

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Zhang, Ruqi, Li, Chunyuan, Zhang, Jianyi, Chen, Changyou, Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-11-2019

The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove that our proposed learning rate schedule provides faster convergence to samples from a stationary distribution than SG-MCMC with standard decaying schedules. Moreover, we provide extensive experimental results to demonstrate the effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.

algorithm, cyclical stochastic gradient mcmc, sg-mcmc, (12 more...)

arXiv.org Machine Learning

1902.03932

Country:

North America > United States > New York (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.82)

Add feedback