AITopics | beating sgd saturation

Beating SGD Saturation with Tail-Averaging and Minibatching

Neural Information Processing SystemsDec-25-2025, 08:43:54 GMT

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are still poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning rates, also providing practical insights. A novel key result is that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Further, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

beating sgd saturation, name change, tail-averaging and minibatching, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

Reviews: Beating SGD Saturation with Tail-Averaging and Minibatching

Neural Information Processing SystemsJan-23-2025, 13:10:02 GMT

I'll keep my mark and vote for accepting this paper. Yet, the techniques for bounding each term seem borrowed and adapted from previous papers analyzing SGD for least-squares problems -related papers are adequately cited. Quality and clarity: Theoretically speaking, the paper is self-contained and provides proofs of all theorems and a clear discussion on all the assumptions made in the paper. Furthermore, despite the number of parameters concerned with the analysis, the main results (Theorem 1 and Corollary 1) are very clear and clearly compared with the relative work. However, the experimental section may lack of a real dataset where r can be computed and where we could see the difference between tail and uniform averaging.

beating sgd saturation, main result, tail-averaging and minibatching, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Beating SGD Saturation with Tail-Averaging and Minibatching

Neural Information Processing SystemsOct-9-2024, 23:55:59 GMT

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are still poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning rates, also providing practical insights. A novel key result is that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Further, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

beating sgd saturation, learning, tail-averaging and minibatching, (1 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Beating SGD Saturation with Tail-Averaging and Minibatching

Muecke, Nicole, Neu, Gergely, Rosasco, Lorenzo

Neural Information Processing SystemsMar-19-2020, 01:47:10 GMT

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are still poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning rates, also providing practical insights. A novel key result is that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Further, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

beating sgd saturation, learning, tail-averaging and minibatching, (1 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Filters

Collaborating Authors

beating sgd saturation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Beating SGD Saturation with Tail-Averaging and Minibatching

Reviews: Beating SGD Saturation with Tail-Averaging and Minibatching

Beating SGD Saturation with Tail-Averaging and Minibatching

Beating SGD Saturation with Tail-Averaging and Minibatching