Goto

Collaborating Authors

 nag



c4b108f53550f1d5967305a9a8140ddd-Paper.pdf

Neural Information Processing Systems

Here we study structure-preserving discretizations for a certain class of dissipative (conformal) Hamiltonian systems, allowing us to analyze the symplectic structure of both Nesterov and heavy ball, besides providing several new insights into these methods. Moreover, we propose a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization.





AlgorithmicInstabilities ofAcceleratedGradientDescent

Neural Information Processing Systems

We disprove this conjecture and show,fortwonotions ofalgorithmic stability (including uniform stability), that the stability of Nesterov's accelerated method in fact deteriorates exponentiallyfast withthenumberofgradientsteps.


Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation

Ali, Sarwan

arXiv.org Artificial Intelligence

Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We propose Heavy-Ball Synthetic Gradient Extrapolation (HB-SGE), a robust first-order method that combines heavy-ball momentum with predictive gradient extrapolation. Unlike classical momentum methods that accumulate historical gradients, HB-SGE estimates future gradient directions using local Taylor approximations, providing adaptive acceleration while maintaining stability. We prove convergence guarantees for strongly convex functions and demonstrate empirically that HB-SGE prevents divergence on problems where NAG and standard momentum fail. On ill-conditioned quadratics (condition number κ = 50), HB-SGE converges in 119 iterations while both SGD and NAG diverge. On the non-convex Rosen-brock function, HB-SGE achieves convergence in 2,718 iterations where classical momentum methods diverge within 10 steps. While NAG remains faster on well-conditioned problems, HB-SGE provides a robust alternative with speedup over SGD across diverse landscapes, requiring only O(d) memory overhead and the same hy-perparameters as standard momentum.


Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance

Zhong, Jincheng, Jiang, Boyuan, Tao, Xin, Wan, Pengfei, Gai, Kun, Long, Mingsheng

arXiv.org Artificial Intelligence

Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. We refer to this misalignment as noise shift. Through empirical analysis, we demonstrate that noise shift is widespread in modern diffusion models and exhibits a systematic bias, leading to sub-optimal generation due to both out-of-distribution generalization and inaccurate denoising updates. To address this problem, we propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule. We further introduce a classifier-free variant of NAG, which jointly trains a noise-conditional and a noise-unconditional model via noise-condition dropout, thereby eliminating the need for external classifiers. Extensive experiments, including ImageNet generation and various supervised fine-tuning tasks, show that NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.