Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Dec-24-2025, 13:06:01 GMT–Neural Information Processing Systems

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AQ-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks.

activation quantization, compression, fine-tuning language model, (10 more...)

Neural Information Processing Systems

Dec-24-2025, 13:06:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)