Reviews: New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Neural Information Processing Systems 

This paper lies in the well-developed field of variance-reduced stochastic gradient algorithms. It proposes a theoretical study of the well-known HSGD for sampling without replacement scheme, which is known to perform better than sampling with replacement in practice. The considered setting makes the analysis of the algorithm more difficult, since in this case the mini-batch gradients are not unbiased anymore. Rates for strongly-convex, non-strongly convex and non-convex objectives are provided, with a special care for linearly structured problems (such as generalized linear models). Numerical experiments are convincing enough, although the datasets used in experiments are not very large (largest one is covtype) (while authors argue that this methods is interesting in the large n medium precision setting).