Reviews: Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Oct-9-2024, 04:32:40 GMT–Neural Information Processing Systems

In this paper, the authors present an algorithm for decentralized parallel stochastic gradient descent (PSGD). In contrast to centralized PSGD where worker nodes compute local gradients and the weights of a model are updated on a central node, decentralized PSGD seeks to perform training without a central node, in regimes where each node in a network can communicate with only a handful of adjacent nodes. While this network architecture has typically been viewed as a limitation, the authors present a theoretical analysis of their algorithm that suggests D-PSGD can achieve a linear speedup comparable to C-PSGD, but with significantly lower communication overhead. As a result, in certain low bandwidth or high latency network scenarios, D-PSGD can outperform C-PSGD. Overall, I believe the technical contributions of this paper could be very valuable.

d-psgd, decentralized algorithm outperform centralized algorithm, decentralized parallel stochastic gradient descent, (7 more...)

Neural Information Processing Systems

Oct-9-2024, 04:32:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)