Goto

Collaborating Authors

 Education






Adaptive Methods for Nonconvex Optimization

Neural Information Processing Systems

The first prominent algorithms in this line of research isADAGRAD [7,22], which uses a per-dimension learning rate based on squared pastgradients.ADAGRADachievedsignificant performance gainsincomparison toSGDwhenthe gradientsaresparse.