Adaptive Methods for Nonconvex Optimization

Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

Neural Information Processing Systems 

The first prominent algorithms in this line of research isADAGRAD [7,22], which uses a per-dimension learning rate based on squared pastgradients.ADAGRADachievedsignificant performance gainsincomparison toSGDwhenthe gradientsaresparse.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found