AITopics | painless stochastic gradient

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing SystemsDec-25-2025, 03:51:01 GMT

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation. To improve the proposed methods' practical performance, we give heuristics to use larger step-sizes and acceleration. We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters.

interpolation, line-search, painless stochastic gradient, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Reviews: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing SystemsJan-22-2025, 09:42:32 GMT

UPDATE: I've read the other reviews and the rebuttal. I am keeping my score - this is a good paper. The study of Stochastic Gradient Descent in overparametrized setting is a popular and important trend in a recent development of huge-scale optimization for deep-learning. The authors propose a very basic and classical method, consisting from the well-known algorithmical blocks (SGD Armijo-type line search) together with its first theoretical justification under "interpolation assumption". The proof of convergence (for example, Theorem 2) mainly consists from the standard arguments (which are used for the proof of the classical non-stochastic Gradient Method under Lipschitz-continuous gradients).

assumption, interpolation assumption, painless stochastic gradient, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)

Add feedback

Reviews: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing SystemsJan-22-2025, 09:42:22 GMT

This paper brings a classic idea into the present and makes progress on a vexing problem with SGD --- setting the step size. The authors provide theoretical evidence as well as emipirical evidence that their method is useful. The assumptions may be somewhat limiting; one version requires strong convexity and when that is relaxed, other assumptions must be made. But this work points to a path that may be useful in the long-run. An important way of contribution in ML is bridging fields; that could mean bringing in ideas that are state-of-the-art in other fields or it could mean revisiting classic ideas in new ways.

convergence rate, interpolation, painless stochastic gradient, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing SystemsOct-9-2024, 16:56:56 GMT

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions.

convergence rate, interpolation, painless stochastic gradient, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Vaswani, Sharan, Mishkin, Aaron, Laradji, Issam, Schmidt, Mark, Gidel, Gauthier, Lacoste-Julien, Simon

Neural Information Processing SystemsMar-18-2020, 22:01:01 GMT

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation.

convergence rate, interpolation, painless stochastic gradient, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

r/MachineLearning - [R]: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

#artificialintelligenceJun-9-2019, 23:12:07 GMT

The authors use a classic Armijo line-search approach in the context of SGD to automatically tune the line search parameter in training the neural networks. They're also able to prove convergence results on minimizing convex and non-convex objective functions satisfying certain growth conditions. An aside, but as an optimization-head myself, it's nice to see some of the traditional optimization ideas make their way into an ML context.

artificial intelligence, machine learning, painless stochastic gradient, (5 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

Filters

Collaborating Authors

painless stochastic gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Reviews: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Reviews: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

r/MachineLearning - [R]: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates