Reviews: Shadowing Properties of Optimization Algorithms

Neural Information Processing Systems 

The paper presents several "shadowing" results for gradient descent and the heavy ball method, under several conditions on the objective. In short, the authors provide conditions under which a discrete approximation of an ODE defines a trajectory that "stays close" to the actual trajectory of the ODE. This research is motivated by a by a recent paper by Su, Jordan, and Candes that models Nesterov's method via an ODE: this leads the authors to ask the question of when an ODE solution indeed well approximates a discrete algorithm, which is what would be implemented in practice. Although the interest and motivation is mostly on HB, the bulk of the results presented in the paper are for GD. The paper is well-written overall, and the results are interesting, if somewhat shallow.