Variance Reduction for Stochastic Gradient Optimization

Apr-6-2023, 11:53:01 GMT–Neural Information Processing Systems

Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as low-order moments (pre-computed or estimated online) is used to form the control variate.

control variate, stochastic gradient optimization, variance reduction, (3 more...)

Neural Information Processing Systems

Apr-6-2023, 11:53:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.97)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.97)