Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm
Needell, Deanna, Ward, Rachel, Srebro, Nati
–Neural Information Processing Systems
We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence on average smoothness, dominating previous results, and more broadly discus how importance sampling for SGD can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.
Neural Information Processing Systems
Dec-31-2014
- Country:
- North America > United States
- California > Los Angeles County
- Claremont (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Texas > Travis County
- Austin (0.04)
- California > Los Angeles County
- North America > United States
- Genre:
- Research Report > New Finding (0.48)
- Technology: