Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

Feb-10-2026–arXiv.org Machine Learning

Stochastic gradient methods are central to large-scale learning, but they treat mini-batch gradients as unbiased estimators, which classical decision theory shows are inadmissible in high dimensions. We formulate gradient computation as a high-dimensional estimation problem and introduce a framework based on Stein-rule shrinkage. We construct a gradient estimator that adaptively contracts noisy mini-batch gradients toward a stable estimator derived from historical momentum. The shrinkage intensity is determined in a data-driven manner using an online estimate of gradient noise variance, leveraging statistics from adaptive optimizers. Under a Gaussian noise model, we show our estimator uniformly dominates the standard stochastic gradient under squared error loss and is minimax-optimal. We incorporate this into the Adam optimizer, yielding SR-Adam, a practical algorithm with negligible computational cost. Empirical evaluations on CIFAR10 and CIFAR100 across multiple levels of input noise show consistent improvements over Adam in the large-batch regime. Ablation studies indicate that gains arise primarily from selectively applying shrinkage to high-dimensional convolutional layers, while indiscriminate shrinkage across all parameters degrades performance. These results illustrate that classical shrinkage principles provide a principled approach to improving stochastic gradient estimation in deep learning.

artificial intelligence, machine learning, stochastic gradient estimation, (11 more...)

arXiv.org Machine Learning

Feb-10-2026

arXiv.org PDF

Add feedback

Country:
- Europe
  - Russia (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Iran (0.04)
  - Russia (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (1.00)
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found