Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

Open in new window