Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Neural Information Processing Systems 

We study the asynchronous stochastic gradient descent algorithm for distributed training over n workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return those to the server without any synchronization.