Reviews: Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent
–Neural Information Processing Systems
This paper presents a formal condition for Byzantine fault-tolerant stochastic gradient aggregation, as well as an algorithm (Krum) that satisfies the condition. Given the definition, it is straightforward to see that aggregation based on linear combination (including averaging) is not Byzantine tolerant. Meanwhile the paper gives evidence that Krum is not only tolerant in theory but reasonably so in practice as well. I found the paper to be clear, well-organized, self-contained, and altogether fairly thorough. The basic motivation is clear: the literature on distributed learning has focused on statistical assumptions and well-behaved actors, but what can we do in very pessimistic conditions?
Neural Information Processing Systems
Oct-8-2024, 13:08:53 GMT
- Technology: