Reviews: Adaptive Methods for Nonconvex Optimization
–Neural Information Processing Systems
Bounds are given for the expected gradient of an ergodic average of the iterates produced by the algorithms applied to an L-smooth function, and these bounds converge to zero with time. The authors give several numerical results showing that their algorithm has state-of-the-art performance for different problems. In addition, they achieve this performance with little tuning, unlike in the classical SGD. A motivation behind their work is a paper [27] that shows that a recent adaptive algorithm, ADAM, can fail to converge even for simple convex problems, when the batch size is kept fix.
Neural Information Processing Systems
Oct-7-2024, 18:49:15 GMT
- Technology: