Byzantine Stochastic Gradient Descent

Alistarh, Dan, Allen-Zhu, Zeyuan, Li, Jerry

Dec-31-2018–Neural Information Processing Systems

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of $m$ machines which allegedly compute stochastic gradients every iteration, an $\alpha$-fraction are Byzantine, and may behave adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions in $T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)$ iterations. In contrast, traditional mini-batch SGD needs $T = O\big( \frac{1}{\varepsilon^2 m} \big)$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sample complexity and time complexity.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Dec-31-2018

Conferences PDF

Add feedback

Country:
- North America (0.46)

Genre:
- Research Report (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
a07c2f3b3b907aaf8436a26c6d77f0a2-Paper.pdf
Byzantine Dan Alistarh ISTAustria dan.alistarh@ist.ac.at Zeyuan

Similar Docs Excel Report more

Title	Similarity	Source
None found