Collaborative Deep Learning in Fixed Topology Networks

Jiang, Zhanhong, Balu, Aditya, Hegde, Chinmay, Sarkar, Soumik

arXiv.org Machine Learning 

In this paper, we address the scalability of optimization algorithms for deep learning in a distributed setting. Scaling up deep learning [1] is becoming increasingly crucial for large-scale applications where the sizes of both the available data as well as the models are massive [2]. Among various algorithmic advances, many recent attempts have been made to parallelize stochastic gradient descent (SGD) based learning schemes across multiple computing agents. An early approach called Downpour SGD [3], developed within Google's disbelief software framework, primarily focuses on model parallelization (i.e., splitting the model across the agents). A different approach known as elastic averaging SGD (EASGD) [4] attempts to improve perform multiple SGDs in parallel; this method uses a central parameter server that helps in assimilating parameter updates from the computing agents. However, none of the above approaches concretely address the issue of data parallelization, which is an important issue for several learning scenarios: for example, data parallelization enables privacy-preserving learning in scenarios such as distributed learning with a network of mobile and Internet-of-Things (IoT) devices. A recent scheme called Federated Averaging SGD [5] attempts such a data parallelization in the context of deep learning with significant success; however, they still use a central parameter server. In contrast, deep learning with decentralized computation can be achieved via gossip SGD algorithms [6, 7], where agents communicate probabilistically without the aid of a parameter server.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found