communication overhead
- Asia > China (0.04)
- North America > United States > North Carolina (0.04)
- Asia > Singapore (0.04)
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Asia > China (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology (0.46)
- Transportation > Ground > Road (0.46)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
056e8e9c8ca9929cb6cf198952bf1dbb-Supplemental-Conference.pdf
This search does not affect the computational complexity, which is O(νnDE +SE) for agent n that computes DE parallel consensus steps and goes over a listofSE actionprofiles. Intuitively,wewouldneedE KN tofindtheoptimalactionprofile even with no noise, which creates delays where agents have to wait for their average reward to go abovetheirλn. In the multitasking robots game, if agent n has Ren = 0, then theoptimalactionprofilea e hastosatisfya e,m = nforallm. Ifλisasafemarginawayfromthe boundary of C(G), then most agents will have Ren = 0 most of the time. Hence, their performance depends on the best action profile in SE.
Distributed Distillation for On-Device Learning
On-device learning promises collaborative training of machine learning models across edge devices without the sharing of user data. In state-of-the-art on-device learning algorithms, devices communicate their model weights over a decentralized communication network. Transmitting model weights requires huge communication overhead and means only devices with identical model architectures can be included. To overcome these limitations, we introduce a distributed distillation algorithm where devices communicate and learn from soft-decision (softmax) outputs, which are inherently architecture-agnostic and scale only with the number of classes. The communicated soft-decisions are each model's outputs on a public, unlabeled reference dataset, which serves as a common vocabulary between devices. We prove that our algorithm converges with probability 1 to a stationary point where all devices in the communication network distill the entire network's knowledge on the reference data, regardless of their local connections. Our analysis assumes smooth loss functions, which can be non-convex. Simulations support our theoretical findings and show that even a naive implementation of our algorithm significantly reduces the communication overhead while achieving an overall comparable performance to state-of-the-art, depending on the regime. By requiring little communication overhead and allowing for cross-architecture training, we remove two main obstacles to scaling on-device learning.
DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
Sparse tensors appear frequently in federated deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored to federated deep learning. DeepReduce decomposes sparse tensors into two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values, and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on TensorFlow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits 320% less data than existing sparsifiers, without affecting accuracy.