Distributed SGD Generalizes Well Under Asynchrony

Regatti, Jayanth, Tendolkar, Gaurav, Zhou, Yi, Gupta, Abhishek, Liang, Yingbin

Sep-29-2019–arXiv.org Machine Learning

Jayanth Regatti Gaurav Tendolkar Yi Zhou Abhishek Gupta Yingbin Liang Abstract -- The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we study the generalization performance of stochastic gradient descent (SGD) on a distributed asynchronous system. The system consists of multiple worker machines that compute stochastic gradients which are further sent to and aggregated on a common parameter server to update the variables, and the communication in the system suffers from possible delays. Under the algorithm stability framework, we prove that distributed asynchronous SGD generalizes well given enough data samples in the training optimization. In particular, our results suggest to reduce the learning rate as we allow more asynchrony in the distributed system. Such adaptive learning rate strategy improves the stability of the distributed algorithm and reduces the corresponding generalization error . Then, we confirm our theoretical findings via numerical experiments. I NTRODUCTION Stochastic gradient descent (SGD) and its variants (e.g., Adagrad, Adam, etc) have been very effective in solving many challenging machine learning problems such as training deep neural networks. In practice, the solution found by SGD via solving an empirical risk minimization problem typically has good generalization performance on the test dataset.

computer based training, educational technology, generalization error, (20 more...)

arXiv.org Machine Learning

Sep-29-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > Ohio (0.14)

Genre:
- Research Report > New Finding (0.54)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found