From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees

Sep-16-2025–arXiv.org Artificial Intelligence

Low-rank gradient compression methods, such as PowerSGD, have gained attention in communication-efficient distributed optimization. However, the convergence guarantees of PowerSGD remain unclear, particularly in stochastic settings. In this paper, we show that PowerSGD does not always converge to the optimal solution and provide a clear counterexample to support this finding. To address this, we introduce PowerSGD+, which periodically updates the projection subspace via singular value decomposition, ensuring that it remains aligned with the optimal subspace. We prove that PowerSGD+ converges under standard assumptions and validate its effectiveness through empirical evaluation on large language model tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-16-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.66)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found