Reviews: SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

Neural Information Processing Systems 

The paper is overall clearly written, but one important aspect of the algorithm remains not sufficiently expounded: how precisely the subspace optimization is carried over. The paper only mentions in passing that it uses conjugate gradient (CG), but a number of points would deserve further clarification: a) is CG done over a *single* larger minibatch? And how precisely is this minibatch chosen. Which version/implementation do you use? The computational cost *and* additional memory requirement (as this can constitute a practical limitation for large nets) for the subspace optimization would need to be disclosed and made precise.