Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Oct-10-2024, 06:44:43 GMT–Neural Information Processing Systems

We consider the problem of stochastic optimization with delayed gradients in which, at each time step t, the algorithm makes an update using a stale stochastic gradient from step t - d_t for some arbitrary delay d_t . This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires O( \sigma 2/\epsilon 4 \tau/\epsilon 2) steps for finding an \epsilon -stationary point x . This improves over previous work, which showed that stochastic gradient decent achieves the same rate but with respect to the \emph{maximal} delay \max_{t} d_t, that can be significantly larger than the average delay especially in heterogeneous distributed systems.

algorithm, arbitrary delay, asynchronous stochastic optimization robust, (3 more...)

Neural Information Processing Systems

Oct-10-2024, 06:44:43 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)