Goto

Collaborating Authors

 parameter server





BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Neural Information Processing Systems

In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology.




Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Neural Information Processing Systems

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones.