Subspace Networks: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Jun-14-2026, 08:17:43 GMT–Neural Information Processing Systems

Scaling models has led to significant advancements in deep learning, but training these models in decentralized settings remains challenging due to communication bottlenecks. While existing compression techniques are effective in data-parallel, they do not extend to model parallelism. Unlike data-parallel training, where weight gradients are exchanged, model-parallel requires compressing activations and activation gradients as they propagate through layers, accumulating compression errors. We propose a novel compression algorithm that compresses both forward and backward passes, enabling up to 99% compression with no convergence degradation with negligible memory/compute overhead.

artificial intelligence, machine learning, proceedings, (3 more...)

Neural Information Processing Systems

Jun-14-2026, 08:17:43 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.41)