Goto

Collaborating Authors

 communication-efficient learning


ATOMO: Communication-efficient Learning via Atomic Sparsification

Neural Information Processing Systems

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions.


Reviews: ATOMO: Communication-efficient Learning via Atomic Sparsification

Neural Information Processing Systems

After rebutal; I do not wish to change my evaluation. Regarding convergence, I think that this should be clarified in the paper, to at least ensure that this is not producting divergent sequences under resaonable assumptions. As for the variance, the author control the variance of a certain variable \hat{g} given g but they should control the variance of \hat{g} without conditioning to invoke general convergence results. This is very minor but should be mentioned. The authors consider the problem of empirical risk minimization using a distributed stochastic gradient descent algorithm.


Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

Castiglia, Timothy, Das, Anirban, Wang, Shiqiang, Patterson, Stacy

arXiv.org Artificial Intelligence

We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90\%$ without a significant decrease in accuracy over VFL without compression.


Communication-Efficient Learning of Deep Networks from Decentralized Data

McMahan, H. Brendan, Moore, Eider, Ramage, Daniel, Hampson, Seth, Arcas, Blaise Agüera y

arXiv.org Artificial Intelligence

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.


Introduction to Federated Learning

#artificialintelligence

There are over 5 billion mobile device users all over the world. Such users generate massive amounts of data--via cameras, microphones, and other sensors like accelerometers--which can, in turn, be used for building intelligent applications. Such data is then collected in data centers for training machine/deep learning models in order to build intelligent applications. However, due to data privacy concerns and bandwidth limitations, common centralized learning techniques aren't appropriate--users are much less likely to share data, and thus the data will be only available on the devices. This is where federated learning comes into play. According to Google's research paper titled, Communication-Efficient Learning of Deep Networks from Decentralized Data [1], the researchers provide the following high-level definition of federated learning: A learning technique that allows users to collectively reap the benefits of shared models trained from [this] rich data, without the need to centrally store it.


ATOMO: Communication-efficient Learning via Atomic Sparsification

Wang, Hongyi, Sievert, Scott, Liu, Shengchao, Charles, Zachary, Papailiopoulos, Dimitris, Wright, Stephen

Neural Information Processing Systems

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance.