AITopics | random reshuffling

1a3650aedfdd3a21444047ed2d89458f-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 22:45:15 GMT

artificial intelligence, machine learning, scenario, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

Neural Information Processing SystemsMar-21-2026, 18:09:53 GMT

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. We first develop a distributed variant of random reshuffling with gradient compression (Q-RR), and show how to reduce the variance coming from gradient quantization through the use of control iterates. Next, to have a better fit to Federated Learning applications, we incorporate local computation and propose a variant of Q-RR called Q-NASTYA. Q-NASTYA uses local gradient steps and different local and global stepsizes. Next, we show how to reduce compression variance in this setting as well. Finally, we prove the convergence results for the proposed methods and outline several settings in which they improve upon existing algorithms.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning

Qiu, Junwen, Mei, Leilei, Zhang, Junyu

arXiv.org Machine LearningMar-18-2026

The global Lipschitz smoothness condition underlies most convergence and complexity analyses via two key consequences: the descent lemma and the gradient Lipschitz continuity. How to study the performance of optimization algorithms in the absence of Lipschitz smoothness remains an active area. The relative smoothness framework from Bauschke-Bolte-Teboulle (2017) and Lu-Freund-Nesterov (2018) provides an extended descent lemma, ensuring convergence of Bregman-based proximal gradient methods and their vanilla stochastic counterparts. However, many widely used techniques (e.g., momentum schemes, random reshuffling, and variance reduction) additionally require the Lipschitz-type bound for gradient deviations, leaving their analysis under relative smoothness an open area. To resolve this issue, we introduce the dual kernel conditioning (DKC) regularity condition to regulate the local relative curvature of the kernel functions. Combined with the relative smoothness, DKC provides a dual Lipschitz continuity for gradients: even though the gradient mapping is not Lipschitz in the primal space, it preserves Lipschitz continuity in the dual space induced by a mirror map. We verify that DKC is widely satisfied by popular kernels and is closed under affine composition and conic combination. With these novel tools, we establish the first complexity bounds as well as the iterate convergence of random reshuffling mirror descent for constrained nonconvex relative smooth problems.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

2603.16042

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Coordinating Distributed Example Orders for Provably Accelerated Training

Neural Information Processing SystemsFeb-16-2026, 13:52:18 GMT

Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

CD_GraB_camera_ready

A. Feder Cooper

Neural Information Processing SystemsFeb-16-2026, 13:52:14 GMT

Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Random Reshuffling: Simple Analysis with Vast Improvements

Neural Information Processing SystemsFeb-10-2026, 08:14:44 GMT

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large.

artificial intelligence, machine learning, variance, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(6 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

RandomShufflingBeatsSGDOnlyAfterMany EpochsonIll-ConditionedProblems

Neural Information Processing SystemsFeb-9-2026, 13:27:01 GMT

However, known lower bounds ignore the problem's geometry,including itscondition number,whereas theupper bounds explicitly depend on it. Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGDdoesnotsignificantly improveon withreplacement SGD in terms of worst-case bounds, unless the number of epochs (passes overthedata) islargerthanthecondition number.

artificial intelligence, arxivpreprintarxiv, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Herding_NeurIPS22

Yucheng Lu

Neural Information Processing SystemsFeb-8-2026, 10:06:26 GMT

T -Tinyontwo GLUEtasks (left: QNLI; right: SST -2).

artificial intelligence, machine learning, random reshuffling, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

1a3650aedfdd3a21444047ed2d89458f-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:53:58 GMT

convergence rate, prox-dfinito, scenario, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Random Reshuffling: Simple Analysis with Vast Improvements

Neural Information Processing SystemsDec-24-2025, 14:51:11 GMT

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large. We remove these 3 assumptions, improve the dependence on the condition number from $\kappa^2$ to $\kappa$ (resp.\

name change, random reshuffling, simple analysis, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Add feedback

Filters

Collaborating Authors

random reshuffling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

1a3650aedfdd3a21444047ed2d89458f-Paper.pdf

Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning

Coordinating Distributed Example Orders for Provably Accelerated Training

CD_GraB_camera_ready

Random Reshuffling: Simple Analysis with Vast Improvements

RandomShufflingBeatsSGDOnlyAfterMany EpochsonIll-ConditionedProblems

Herding_NeurIPS22

1a3650aedfdd3a21444047ed2d89458f-Paper.pdf

Random Reshuffling: Simple Analysis with Vast Improvements