Natasha 2: Faster Non-Convex Optimization Than SGD
–Neural Information Processing Systems
In diverse world of deep learning research has given rise to numerous architectures for neural networks(convolutionalones,longshorttermmemoryones,etc). However,tothisdate,theunderlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.
Neural Information Processing Systems
Feb-13-2026, 08:38:57 GMT
- Country:
- North America
- Canada (0.04)
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- North America
- Technology: