Reviews: Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Neural Information Processing Systems 

The fundamental claim [line 101 & 239] is that asymptotically, for streaming PCA, the delay tau is allowed to scale as (1 - mu) 2 / sqrt(eta), where mu is the step size and mu the momentum parameter. Major Comments Before we discuss the proof, I think the introduction is somewhat misleading. In line 76, the authors point out previous work all focus on analyzing convergence to a first order optimal solution. The readers can be confused that this paper improved the results of previous work. However, the problems studies in those paper and streaming PCA are different.