Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Needell, Deanna, Ward, Rachel, Srebro, Nati

Neural Information Processing Systems 

We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. Papers published at the Neural Information Processing Systems Conference.