Beating SGD Saturation with T ail-A veraging and Minibatching

Neural Information Processing Systems 

Stochastic gradient descent (SGD) provides a simple and yet stunningly efficient way to solve a broad range of machine learning problems.