On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

Li, Xianliang, Luo, Jun, Zheng, Zhiwei, Wang, Hanxiao, Luo, Li, Wen, Lingkun, Wu, Linlong, Xu, Sheng

Nov-29-2024–arXiv.org Artificial Intelligence

Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance generalization performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.

artificial intelligence, machine learning, momentum system, (18 more...)

arXiv.org Artificial Intelligence

Nov-29-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- North America > United States (0.46)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning > Gradient Descent (0.89)