AITopics | svrg

Collaborating Authors

svrg

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continuous-time Riemannian SGD and SVRGFlows on Wasserstein Probabilistic Space

Neural Information Processing SystemsJun-17-2026, 09:23:04 GMT

Recently, optimization on the Riemannian manifold have provided valuable insights to the optimization community. In this regard, extending these methods to to the Wasserstein space is of particular interest, since optimization on Wasserstein space is closely connected to practical sampling processes. Generally, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the family of continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete Euclidean dynamics of the desired Riemannian stochastic methods. Then, we obtain the flows in Wasserstein space by Fokker-Planck equation. Finally, we establish convergence rates of the proposed stochastic flows, which align with those known in the Euclidean setting.

artificial intelligence, inequality, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

AMore on the background

Neural Information Processing SystemsApr-24-2026, 13:30:38 GMT

A.1 SVRG and SCSG Here we provide the pseudocode for SVRG (Algorithm 2) and SCSG (Algorithm 3) seen in Lei et al. [35]. The idea of SVRG (Algorithm 2) is to reuses past full gradient computations (line 3) to reduce the variance of the current stochastic gradient estimate (line 7) before the parameter update (line 8). Note that N = 1 corresponds to a GD step (i.e., v SVRG achieves linear convergence O(1/T) using the semi-stochastic gradient. The key difference is that SCSG (Algorithm 3) considers a sequence of time-varying batch sizes (Bt and bt) and employs geometric sampling to generate the number of parameter update steps Nt in each iteration (line 6), instead of fixing the batch sizes and the number of updates as done in SVRG. Particularly when finding an -approximate solution (Definition 1) for optimizing smooth non-convex objectives, Lei et al. [35] proves that SCSG is never worse than SVRG in convergence rate and significantly outperforms SVRG when the requiredis small.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.54)
Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

Without-Replacement Sampling for Stochastic Gradient Methods Ohad Shamir Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel ohad.shamir@weizmann.ac.il

Neural Information Processing SystemsApr-22-2026, 03:44:07 GMT

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.40)
Europe (0.28)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

Stochastic Variance Reduction Methods for Saddle-Point Problems

Balamurugan Palaniappan, Francis Bach

Neural Information Processing SystemsMar-23-2026, 03:04:08 GMT

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which are common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) there are two notions of splits, in terms of functions, or in terms of partial derivatives, (c) the split does need to be done with convex-concave terms, (d) non-uniform sampling is key to an efficient algorithm, both in theory and practice, and (e) these incremental algorithms can be easily accelerated using a simple extension of the "catalyst" framework, leading to an algorithm which is always superior to accelerated batch algorithms.

artificial intelligence, machine learning, survey article, (17 more...)

Neural Information Processing Systems

Genre: Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Nested Variance Reduction for Nonconvex Optimization

Neural Information Processing SystemsMar-16-2026, 18:25:30 GMT

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses $K+1$ nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

The Lingering of Gradients: How to Reuse Gradients Over Time

Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Neural Information Processing SystemsFeb-14-2026, 06:07:19 GMT

This is meaningful because in most applications, the time complexities for evaluating gradients at different points are of the same magnitude.

artificial intelligence, gradient, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

cd00692c3bfe59267d5ecfac5310286c-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 04:11:27 GMT

experiment, mini-batch size, svrg, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Aaron Defazio, Leon Bottou

Neural Information Processing SystemsFeb-12-2026, 19:32:42 GMT

SVR methods use control variates to reduce the variance of the traditional stochastic gradient descent (SGD) estimate f0i(w) of the full gradient f0(w). Control variates are a classical technique for reducing the variance of a stochastic quantity without introducing bias. Say we have some random variable X.

artificial intelligence, machine learning, variance reduction, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

b58f7d184743106a8a66028b7a28937c-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 22:31:09 GMT

algorithm, dynamic regret, experiment section, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

SVRG and Beyond via Posterior Correction

Daheim, Nico, Möllenhoff, Thomas, Ang, Ming Liang, Khan, Mohammad Emtiyaz

arXiv.org Artificial IntelligenceDec-2-2025

Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising new foundational connections of SVRG to a recently proposed Bayesian method called posterior correction. Specifically, we show that SVRG is recovered as a special case of posterior correction over the isotropic-Gaussian family, while novel extensions are automatically obtained by using more flexible exponential families. We derive two new SVRG variants by using Gaussian families: First, a Newton-like variant that employs novel Hessian corrections, and second, an Adam-like extension that improves pretraining and finetuning of Transformer language models. This is the first work to connect SVRG to Bayes and use it to boost variational training for deep networks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.0193

Country:

Europe (0.28)
Asia > Japan (0.28)

Genre: Research Report (0.83)

Add feedback