Goto

Collaborating Authors

 Gradient Descent





Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

In diverse world of deep learning research has given rise to numerous architectures for neural networks(convolutionalones,longshorttermmemoryones,etc). However,tothisdate,theunderlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.






Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing Systems

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.