On the Interaction of Noise, Compression Role, and Adaptivity under $(L_0, L_1)$-Smoothness: An SDE-based Approach

Compagnoni, Enea Monzio, Islamov, Rustem, Orvieto, Antonio, Gorbunov, Eduard

Jun-3-2025–arXiv.org Machine Learning

Using stochastic differential equation (SDE) approximations, we study the dynamics of Distributed SGD, Distributed Compressed SGD, and Distributed SignSGD under $(L_0,L_1)$-smoothness and flexible noise assumptions. Our analysis provides insights -- which we validate through simulation -- into the intricate interactions between batch noise, stochastic gradient compression, and adaptivity in this modern theoretical setup. For instance, we show that \textit{adaptive} methods such as Distributed SignSGD can successfully converge under standard assumptions on the learning rate scheduler, even under heavy-tailed noise. On the contrary, Distributed (Compressed) SGD with pre-scheduled decaying learning rate fails to achieve convergence, unless such a schedule also accounts for an inverse dependency on the gradient norm -- de facto falling back into an adaptive method.

artificial intelligence, international conference, machine learning, (15 more...)

arXiv.org Machine Learning

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- Europe
  - Switzerland > Basel-City
    - Basel (0.04)
  - Norway > Eastern Norway
    - Oslo (0.04)
  - Germany > Baden-Württemberg
    - Tübingen Region > Tübingen (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.35)
  - Machine Learning
    - Neural Networks (0.46)
    - Statistical Learning > Gradient Descent (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found