Fast and Accurate Stochastic Gradient Estimation
Chen, Beidi, Xu, Yingchen, Shrivastava, Anshumali
–Neural Information Processing Systems
Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient itself, which we call the chicken-and-the-egg loop. As a result, the false impression of faster convergence in iterations, in reality, leads to slower convergence in time.
Neural Information Processing Systems
Mar-19-2020, 01:45:52 GMT