Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Dec-24-2025, 10:13:25 GMT–Neural Information Processing Systems

We study the asynchronous stochastic gradient descent algorithm, for distributed training over $n$ workers that might be heterogeneous. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return them to the server without any synchronization.Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum delay $\tau_{\max}$ and reach an $\epsilon$-stationary point after $O\!\left(\sigma^2\epsilon^{-2}+ \tau_{\max}\epsilon^{-1}\right)$ iterations, where $\sigma$ is the variance of stochastic gradients.

asynchronous sgd, convergence rate, sharper convergence guarantee, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 10:13:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)