BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training
Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shu-Tao Xia, Jianping Wu
–Neural Information Processing Systems
In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology.
Neural Information Processing Systems
Feb-15-2026, 05:56:40 GMT
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- North America > Canada (0.04)
- Asia > China
- Genre:
- Research Report (0.46)
- Technology: