ATOMO: Communication-efficient Learning via Atomic Sparsification

Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, Stephen Wright

Mar-24-2025, 00:08:08 GMT–Neural Information Processing Systems

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Mar-24-2025, 00:08:08 GMT

Conferences PDF

Add feedback

Country:
- North America (0.46)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.69)
  - Statistical Learning > Gradient Descent (0.89)

Duplicate Docs Excel Report

Title
ATOMO: Communication-efficient Learning via Atomic Sparsification
ATOMO: Communication-efficient Learning via Atomic Sparsification

Similar Docs Excel Report more

Title	Similarity	Source
None found