SGD: The Role of Implicit Regularization, Batch-size and Multiple Epochs

Neural Information Processing Systems 

Our main contributions are threefold: 1. We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn.