Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

Neural Information Processing Systems 

In diverse world of deep learning research has given rise to numerous architectures for neural networks(convolutionalones,longshorttermmemoryones,etc). However,tothisdate,theunderlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found