Reviews: Connecting Optimization and Regularization Paths

Neural Information Processing Systems 

The authors explore the relation between the trajectory of Gradient Descent (GD) initiated in the origin and the regularization path for l2-regularized minimization of the same objective. They first study the continuous-time setting where GD is replaced Gradient Flow, assuming that the objective is smooth and strongly convex. The main result (Theorem 1 whose proof I have verified) is as follows: under the appropriate scaling between the time t in GD and the inverse regularization parameter \eta, the two trajectories do not diverge much. This result is obtained by quantifying the shrinkage of the gradients as t and eta tend to infinity. In the continuous-time setting, the authors manage to reduce this task to formulating and solving certain ODEs.