Torque-Aware Momentum

Malviya, Pranshu, Mordido, Goncalo, Baratin, Aristide, Harikandeh, Reza Babanezhad, Dziugaite, Gintare Karolina, Pascanu, Razvan, Chandar, Sarath

Dec-25-2024–arXiv.org Artificial Intelligence

Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely used in stateof-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers. Despite the wide range of optimization methods available in the literature, stochastic gradient descent (SGD), typically augmented with momentum (Kingma & Ba, 2015; Nesterov, 1983; Qian, 1999), remains the go-to approach for practitioners. Momentum accelerates convergence, particularly in the presence of high curvature (Cutkosky & Mehta, 2020b), small but consistent gradients, or noisy gradients. It also helps the optimizer navigate the loss landscape and escape local minima or saddle points by maintaining consistent updates directions (Jin et al., 2018). While SGD with momentum (SGDM) has shown remarkable success in various scenarios, particularly in computer vision (Sutskever et al., 2013), it remains vulnerable to In this work, we propose that minimizing the influence of misaligned gradients during momentum updates can preserve valuable information and improve the exploration Figure 1: Comparing momentum updates capabilities of momentum-based methods. To enable more obtained using SGDM and TAM consistent exploration of the loss landscape, particularly in for a given SGD trajectory.

machine learning, natural language, sgdm, (18 more...)

arXiv.org Artificial Intelligence

Dec-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)
- North America > Canada (0.28)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.86)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found