Reviews: Barzilai-Borwein Step Size for Stochastic Gradient Descent

Neural Information Processing Systems 

It allows using "Option I" (taking the final iterate of the inner iteration), as is done in practice. They also propose to use a scaled version of Barzilai-Borwein to set the step-sizse for SVRG (and heuristically argue that this could also be useful for classic stochastic gradient methods too). Their experiments show that this adaptive step-size is competitive with fixed step-sizes. Clarity: The paper is very clearly-written and easy to understand (though many grammar issues remain). Significance: Although several heuristic adaptive step-size strategies exist in the literature, this is the first theoretically-justified method. It sill depends on constants that we don't know in general, but I believe is a step towards black-box SG methods. Details: Independent of the SVRG/SG results, the authors give a nice way to bound the step-size for the BB method. Normally, BB leads to a much faster rate than using a constant step-size, but in the SVRG setting your theory/experiments are just showing that it does as well as the best step-size (which is good, but it isn't better than the best step size). Finally, the paper would be much stronger if it compared to the two existing strategies that are used in practice: 1.