Goto

Collaborating Authors

 rmsprop




76c073d8a82d9ddaf993300be03ac70f-Paper.pdf

Neural Information Processing Systems

We prove the O(t 1/2) rate of convergence for the squared norm of the gradient of Moreau envelope, which is the standard stationarity measure for this class of problems. It matches the known rates that adaptive algorithms enjoy for the specific case ofunconstrained smoothnonconvexstochastic optimization.


Non-asymptoticAnalysisofBiasedAdaptive StochasticApproximation

Neural Information Processing Systems

While these algorithms havebeen extensively studied, both theoretically and practically, see, e.g., [10], many questions remain open. In particular, most results are based onthecase where theestimatord V isunbiased.




ANO : Faster is Better in Noisy Landscape

Kegreisz, Adrien

arXiv.org Artificial Intelligence

Stochastic optimizers are central to deep learning, yet widely used methods such as Adam and Adan can degrade in non-stationary or noisy environments, partly due to their reliance on momentum-based magnitude estimates. We introduce Ano, a novel optimizer that decouples direction and magnitude: momentum is used for directional smoothing, while instantaneous gradient magnitudes determine step size. This design improves robustness to gradient noise while retaining the simplicity and efficiency of first-order methods. We further propose Anolog, which removes sensitivity to the momentum coefficient by expanding its window over time via a logarithmic schedule. We establish non-convex convergence guarantees with a convergence rate similar to other sign-based methods, and empirically show that Ano provides substantial gains in noisy and non-stationary regimes such as reinforcement learning, while remaining competitive on low-noise tasks.