AITopics | hsgd

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Neural Information Processing SystemsMar-16-2026, 21:29:23 GMT

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Pan Zhou, Xiaotong Yuan, Jiashi Feng

Neural Information Processing SystemsFeb-19-2026, 19:34:09 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, hsgd, ifo complexity, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

19ba2b9448d5de25826f6eb408dab194-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 17:15:59 GMT

experiment, justification, theorem 3, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saxony > Dresden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Neural Information Processing SystemsNov-20-2025, 22:22:00 GMT

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.

hybrid stochastic gradient descent, with-replacement sampling, with-replacement sampling and convexity, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Pan Zhou, Xiaotong Yuan, Jiashi Feng

Neural Information Processing SystemsNov-20-2025, 17:03:14 GMT

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HS-GD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization.

artificial intelligence, hsgd, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

High-Dimensional Privacy-Utility Dynamics of Noisy Stochastic Gradient Descent on Least Squares

Lin, Shurong, Kolaczyk, Eric D., Smith, Adam, Paquette, Elliot

arXiv.org Artificial IntelligenceOct-21-2025

The interplay between optimization and privacy has become a central theme in privacy-preserving machine learning. Noisy stochastic gradient descent (SGD) has emerged as a cornerstone algorithm, particularly in large-scale settings. These variants of gradient methods inject carefully calibrated noise into each update to achieve differential privacy, the gold standard notion of rigorous privacy guarantees. Prior work primarily provides various bounds on statistical risk and privacy loss for noisy SGD, yet the \textit{exact} behavior of the process remains unclear, particularly in high-dimensional settings. This work leverages a diffusion approach to analyze noisy SGD precisely, providing a continuous-time perspective that captures both statistical risk evolution and privacy loss dynamics in high dimensions. Moreover, we study a variant of noisy SGD that does not require explicit knowledge of gradient sensitivity, unlike existing work that assumes or enforces sensitivity through gradient clipping. Specifically, we focus on the least squares problem with $\ell_2$ regularization.

artificial intelligence, machine learning, noisy sgd, (16 more...)

arXiv.org Artificial Intelligence

2510.16687

Country: North America (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

19ba2b9448d5de25826f6eb408dab194-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:11:37 GMT

experiment, justification, theorem 3, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saxony > Dresden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications (0.68)

Add feedback

Emergence of heavy tails in homogenized stochastic gradient descent

Jiao, Zhe, Keller-Ressel, Martin

arXiv.org Artificial IntelligenceFeb-2-2024

An important step in this direction It has repeatedly been observed that loss minimization by has been taken in Gurbuzbalaban et al. [2021], where the tail stochastic gradient descent leads to heavy-tailed distributions behavior of SGD iterates is characterized in dependence on of neural network parameters. Here, we analyze a continuous optimization parameters, dimension and Hessian curvature diffusion approximation of SGD, called homogenized stochastic at the loss minimum. One limitation of Gurbuzbalaban et al. gradient descent, show that it behaves asymptotically [2021] is that this link is described only qualitatively, but heavy-tailed, and give explicit upper and lower bounds on not quantitatively. Here, we provide an alternative approach its tail-index. We validate these bounds in numerical experiments through analyzing homogenized stochastic gradient descent, and show that they are typically close approximations a diffusion approximation of SGD introduced in Paquette to the empirical tail-index of SGD iterates.

approximation, gradient descent, stochastic gradient descent, (14 more...)

arXiv.org Artificial Intelligence

2402.01382

Country:

Europe > Germany > Saxony > Dresden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Zhou, Pan, Yuan, Xiaotong, Feng, Jiashi

Neural Information Processing SystemsFeb-14-2020, 07:43:19 GMT

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.

hybrid stochastic gradient descent, new insight, with-replacement sampling and convexity, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Zhou, Pan, Yuan, Xiaotong, Feng, Jiashi

Neural Information Processing SystemsDec-31-2018

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.

artificial intelligence, hsgd, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

hsgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

19ba2b9448d5de25826f6eb408dab194-Paper-Conference.pdf

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

High-Dimensional Privacy-Utility Dynamics of Noisy Stochastic Gradient Descent on Least Squares

19ba2b9448d5de25826f6eb408dab194-Paper-Conference.pdf

Emergence of heavy tails in homogenized stochastic gradient descent

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity