HOME-3: High-Order Momentum Estimator with Third-Power Gradient for Convex and Smooth Nonconvex Optimization

Zhang, Wei, Zidan, Arif Hassan, Jahin, Afrar, Bao, Yu, Liu, Tianming

May-20-2025–arXiv.org Artificial Intelligence

Momentum-based gradients are essential for optimizing advanced machine learning models, as they not only accelerate convergence but also advance optimizers to escape stationary points. While most state-of-the-art momentum techniques utilize lower-order gradients, such as the squared first-order gradient, there has been limited exploration of higher-order gradients, particularly those raised to powers greater than two. In this work, we introduce the concept of high-order momentum, where momentum is constructed using higher-power gradients, with a focus on the third-power of the first-order gradient as a representative case. Our research offers both theoretical and empirical support for this approach. Theoretically, we demonstrate that incorporating third-power gradients can improve the convergence bounds of gradient-based optimizers for both convex and smooth nonconvex problems. Empirically, we validate these findings through extensive experiments across convex, smooth nonconvex, and nonsmooth nonconvex optimization tasks. Across all cases, high-order momentum consistently outperforms conventional low-order momentum methods, showcasing superior performance in various optimization problems.

artificial intelligence, machine learning, optimizer, (20 more...)

arXiv.org Artificial Intelligence

May-20-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Middle East
  - Egypt (0.04)
- Asia > Middle East
  - Jordan (0.05)
- North America
  - Canada > Ontario (0.04)
  - United States
    - Georgia
      - Clarke County > Athens (0.14)
      - Richmond County > Augusta (0.04)
    - Michigan (0.04)

Genre:
- Research Report
  - Experimental Study (0.46)
  - New Finding (0.46)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.46)
  - Statistical Learning (1.00)