Adam
Training a neural network consist of optimizing stochastic functions in a high-dimensional space. In this context, finding a computationally and memory-efficient method with fast convergence properties is hard. On one hand, the objective functions encounter in Deep Learning are in most cases non-convex and non-stationary. On the other hand, gradients can be noisy and/or sparse. For efficient stochastic optimization, Adam, just like RMSProp and AdaGrad, rely only on first-order gradients.
Nov-29-2022, 23:25:07 GMT
- Technology: