Communication trade-offs for Local-SGD with large step size

Dieuleveut, Aymeric, Patel, Kumar Kshitij

Mar-19-2020, 02:16:08 GMT–Neural Information Processing Systems

Synchronous mini-batch SGD is state-of-the-art for large-scale distributed machine learning. However, in practice, its convergence is bottlenecked by slow communication rounds between worker nodes. A natural solution to reduce communication is to use the \emph{ local-SGD''} model in which the workers train their model independently and synchronize every once in a while. This algorithm improves the computation-communication trade-off but its convergence is not understood very well. We propose a non-asymptotic error analysis, which enables comparison to \emph{one-shot averaging} i.e., a single communication round among independent workers, and \emph{mini-batch averaging} i.e., communicating at every step.

communication trade-off, local-sgd, step size, (4 more...)

Neural Information Processing Systems

Mar-19-2020, 02:16:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.49)