Goto

Collaborating Authors

 decentralized computation


Collaborative Deep Learning in Fixed Topology Networks

Neural Information Processing Systems

There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction. We demonstrate the efficacy of our algorithms in comparison with the baseline centralized SGD and the recently proposed federated averaging algorithm (that also enables data parallelism) based on benchmark datasets such as MNIST, CIFAR-10 and CIFAR-100.


Collaborative Deep Learning in Fixed Topology Networks

Neural Information Processing Systems

There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction. We demonstrate the efficacy of our algorithms in comparison with the baseline centralized SGD and the recently proposed federated averaging algorithm (that also enables data parallelism) based on benchmark datasets such as MNIST, CIFAR-10 and CIFAR-100.




Collaborative Deep Learning in Fixed Topology Networks

Neural Information Processing Systems

There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction.


Collaborative Deep Learning in Fixed Topology Networks

Neural Information Processing Systems

There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction. We demonstrate the efficacy of our algorithms in comparison with the baseline centralized SGD and the recently proposed federated averaging algorithm (that also enables data parallelism) based on benchmark datasets such as MNIST, CIFAR-10 and CIFAR-100.


Collaborative Deep Learning in Fixed Topology Networks

arXiv.org Machine Learning

In this paper, we address the scalability of optimization algorithms for deep learning in a distributed setting. Scaling up deep learning [1] is becoming increasingly crucial for large-scale applications where the sizes of both the available data as well as the models are massive [2]. Among various algorithmic advances, many recent attempts have been made to parallelize stochastic gradient descent (SGD) based learning schemes across multiple computing agents. An early approach called Downpour SGD [3], developed within Google's disbelief software framework, primarily focuses on model parallelization (i.e., splitting the model across the agents). A different approach known as elastic averaging SGD (EASGD) [4] attempts to improve perform multiple SGDs in parallel; this method uses a central parameter server that helps in assimilating parameter updates from the computing agents. However, none of the above approaches concretely address the issue of data parallelization, which is an important issue for several learning scenarios: for example, data parallelization enables privacy-preserving learning in scenarios such as distributed learning with a network of mobile and Internet-of-Things (IoT) devices. A recent scheme called Federated Averaging SGD [5] attempts such a data parallelization in the context of deep learning with significant success; however, they still use a central parameter server. In contrast, deep learning with decentralized computation can be achieved via gossip SGD algorithms [6, 7], where agents communicate probabilistically without the aid of a parameter server.