The Lingering of Gradients: How to Reuse Gradients Over Time

Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Neural Information Processing Systems 

Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the "lingering" of gradients: once a gradient is computed at x