Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang
–Neural Information Processing Systems
In the synchronous stochastic gradient method, each worker processes a random minibatch of its training data, and then the local updates are synchronized by making anAll-Reduce step, which aggregates stochastic gradients from all workers, and taking aBroadcast step that transmits the updated parameter vector back toallworkers.
Neural Information Processing Systems
Feb-12-2026, 14:16:14 GMT