Reviews: Bayesian Distributed Stochastic Gradient Descent

Oct-7-2024, 16:59:35 GMT–Neural Information Processing Systems

Summary: This paper presents a new algorithm called Bayesian Distributed SGD to mitigate the straggler problem when training deep learning on parallel clusters. Unlike Synchronous Distributed SGD approach where a fixed cut-off (number of workers) is predefined, BDSGD uses amortized inference to predict workers' run-times and derive a straggler cut-off accordingly. BDSGD models the joint run-time behaviour of workers which are likely to be correlated due to the underlying cluster architecture. The approach is incorporated as part of the parameter server framework, deciding which sub-gradients to drop in each iteration. Strength: The proposed idea of adaptive cut-off and predicting joint worker runtime through amortized inference with variational auto encoder loss is novel and very interesting.

amortized inference, bayesian, stochastic gradient descent, (5 more...)

Neural Information Processing Systems

Oct-7-2024, 16:59:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.85)
  - Neural Networks (0.58)