Reviews: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Neural Information Processing Systems 

In this paper, the authors present an algorithm for decentralized parallel stochastic gradient descent (PSGD). In contrast to centralized PSGD where worker nodes compute local gradients and the weights of a model are updated on a central node, decentralized PSGD seeks to perform training without a central node, in regimes where each node in a network can communicate with only a handful of adjacent nodes. While this network architecture has typically been viewed as a limitation, the authors present a theoretical analysis of their algorithm that suggests D-PSGD can achieve a linear speedup comparable to C-PSGD, but with significantly lower communication overhead. As a result, in certain low bandwidth or high latency network scenarios, D-PSGD can outperform C-PSGD. Overall, I believe the technical contributions of this paper could be very valuable.