Bayesian Distributed Stochastic Gradient Descent

Feb-13-2026, 13:56:32 GMT–Neural Information Processing Systems

We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel computing clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. The principle novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves over the static-cutoff prior art, leading to higher gradient computation throughput on large compute clusters. In our experiments we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but sometimes also increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness.

artificial intelligence, machine learning, throughput, (18 more...)

Neural Information Processing Systems

Feb-13-2026, 13:56:32 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States (0.28)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Bayesian Distributed Stochastic Gradient Descent
Bayesian Distributed Stochastic Gradient Descent

Similar Docs Excel Report more

Title	Similarity	Source
None found