Byzantine Stochastic Gradient Descent

Alistarh, Dan, Allen-Zhu, Zeyuan, Li, Jerry

Feb-14-2020, 15:13:56 GMT–Neural Information Processing Systems

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of $m$ machines which allegedly compute stochastic gradients every iteration, an $\alpha$-fraction are Byzantine, and may behave adversarially. In contrast, traditional mini-batch SGD needs $T O\big( \frac{1}{\varepsilon 2 m} \big)$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sample complexity and time complexity. Papers published at the Neural Information Processing Systems Conference.

byzantine stochastic gradient descent, iteration, varepsilon 2, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 15:13:56 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)