Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
–Neural Information Processing Systems
We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for t -th update, denominator uses information up to step t-1, while numerator uses gradient at t -th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition.
Neural Information Processing Systems
Feb-11-2025, 02:00:49 GMT
- Technology: