Reviews: The Implicit Bias of AdaGrad on Separable Data

Neural Information Processing Systems 

The main contribution of the paper is to characterize the implicit bias of the adagrad on linear classification problems using logistic loss (and other losses). They show that the adagrad converges to a direction which is a solution of a quadratic optimization problem depending on the initial conditions, hyperparameters and the data itself unlike gradient descent which converges to the max margin direction. They also give a few toy examples to demonstrate the properties of the adagrad solution and the difference as compared to the gradient descent direction. Significance: It is an important problem to understand the implicit bias of various optimization algorithms on the solution and there have been several recent works along this direction. The motivation comes from understanding the generalization abilities of overparametrized deep neural networks.