STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization

Jan-18-2025, 15:26:27 GMT–Neural Information Processing Systems

In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting.

adaptive sgd, nonconvex optimization, recursive momentum, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 15:26:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.61)