AITopics | gradient variance

Training a large-scale deep neural network in a large-scale dataset is challenging and time-consuming. The recent breakthrough of large-batch optimization is a promising way to tackle this challenge.

artificial intelligence, machine learning, segmentation, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

88ae6372cfdc5df69a976e893f4d554b-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 17:56:20 GMT

gradient variance, gradinit, variance, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

88ae6372cfdc5df69a976e893f4d554b-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 17:56:16 GMT

gradinit, initialization, variance, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.04)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsFeb-7-2026, 10:05:36 GMT

For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QA T, comparable to the existing INT8 baseline.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Neural Information Processing SystemsDec-26-2025, 21:55:16 GMT

ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator.

model-based reparameterization policy gradient method, model-based rp pgm, theory and practical algorithm, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients

Neural Information Processing SystemsDec-25-2025, 12:47:48 GMT

Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods--which we collectively refer to as Markov chain score ascent (MCSA) methods--can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.

markov chain score ascent, unifying framework, variational inference, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Filters

Collaborating Authors

gradient variance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

e0fbc0f2e35e58aeffe5524a69ba90e5-Paper-Conference.pdf

e2a23af417a2344fe3a23e652924091f-AuthorFeedback.pdf

abea47ba24142ed16b7d8fbf2c740e0d-Paper.pdf

Large-batchOptimizationforDenseVisual Predictions: TrainingFasterR-CNNin4.2Minutes

88ae6372cfdc5df69a976e893f4d554b-Supplemental.pdf

88ae6372cfdc5df69a976e893f4d554b-Paper.pdf

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients