Reviews: Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing Systems 

Summary: This paper proposes a variant of a family of stochastic optimization algorithms SCSG (closely related to SVRG), and analyzes it in the context of non-convex optimization. The main difference is that the outer loop computes a gradient on a random subset of the data, and the stochastic gradients in the inner loop have a random cardinality and are not restricted to those participating in the outer gradient. The analysis of the stochastic gradients despite this mismatch is the main technical contribution of the paper. The result is a set of new bounds on the performance of the algorithm which improve on both sgd and variance reduced algorithms, at least in the low-medium accuracy regimes. Experimental evidence supports the claims of improvement in objective reduction (both training error and test error) per iteration both before the end of the first pass on data and after, both by SCSG over SGD and by varying batch size variant over the fixed size variant, but only on two networks applied to the venerable and small MNIST dataset. Pros of acceptance: - The claimed results are interesting and shed light on an important regime.