Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Open in new window