Fast and Accurate Stochastic Gradient Estimation

Chen, Beidi, Xu, Yingchen, Shrivastava, Anshumali

Mar-19-2020, 01:45:52 GMT–Neural Information Processing Systems

Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient itself, which we call the chicken-and-the-egg loop. As a result, the false impression of faster convergence in iterations, in reality, leads to slower convergence in time.

accurate stochastic gradient estimation, convergence, stochastic gradient descent, (2 more...)

Neural Information Processing Systems

Mar-19-2020, 01:45:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.68)
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)