quantized gradient
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- (13 more...)
We thank all reviewers for their time and effort in reviewing our paper
We thank all reviewers for their time and effort in reviewing our paper. We set up experiments on PyTorch with ResNet18 (He et al., 2016) on CIFAR10 (Krizhevsky, 2009). Figure 1: Evaluations on CIFAR10: training loss ( 1 st column), test accuracy ( 2 nd column) and total number of transmitted bits. MB) on CIFAR10 are shown in Figure 1 above. We will release our code on GitHub in the final version.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- (13 more...)
We thank all reviewers for their time and effort in reviewing our paper
We thank all reviewers for their time and effort in reviewing our paper. We set up experiments on PyTorch with ResNet18 (He et al., 2016) on CIFAR10 (Krizhevsky, 2009). Figure 1: Evaluations on CIFAR10: training loss ( 1 st column), test accuracy ( 2 nd column) and total number of transmitted bits. MB) on CIFAR10 are shown in Figure 1 above. We will release our code on GitHub in the final version.
Rate-Constrained Quantization for Communication-Efficient Federated Learning
Hamidi, Shayan Mohajer, Bereyhi, Ali
Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from the existing approaches in the literature and develops a novel quantized FL framework, called \textbf{r}ate-\textbf{c}onstrained \textbf{fed}erated learning (RC-FED), in which the gradients are quantized subject to both fidelity and data rate constraints. We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold. This enables for a tunable trade-off between quantization distortion and communication cost. We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.
One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training
Ma, Lianbo, Zhou, Yuee, Ma, Jianlun, Yu, Guo, Li, Qing
Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient. However, we discover that the gradient error will lead to an unexpected zig-zagging-like issue in the gradient descent learning procedures, where the gradient directions rapidly oscillate or zig-zag, and such issue seriously slows down the model convergence. Accordingly, this paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction to defy this issue. During the gradient descent learning, a one-step forward search is designed to find the trial gradient of the next-step, which is adopted to adjust the gradient of current step towards the direction of fast convergence. After that, we backtrack the current step to update the full-precision and quantized weights through the current-step gradient and the trial gradient. A series of theoretical analysis and experiments on benchmark deep models have demonstrated the effectiveness and competitiveness of the proposed method, and our method especially outperforms others on the convergence performance.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Mixed Precision Quantization to Tackle Gradient Leakage Attacks in Federated Learning
Ovi, Pretom Roy, Dey, Emon, Roy, Nirmalya, Gangopadhyay, Aryya
Federated Learning (FL) enables collaborative model building among a large number of participants without the need for explicit data sharing. But this approach shows vulnerabilities when privacy inference attacks are applied to it. In particular, in the event of a gradient leakage attack, which has a higher success rate in retrieving sensitive data from the model gradients, FL models are at higher risk due to the presence of communication in their inherent architecture. The most alarming thing about this gradient leakage attack is that it can be performed in such a covert way that it does not hamper the training performance while the attackers backtrack from the gradients to get information about the raw data. Two of the most common approaches proposed as solutions to this issue are homomorphic encryption and adding noise with differential privacy parameters. These two approaches suffer from two major drawbacks. They are: the key generation process becomes tedious with the increasing number of clients, and noise-based differential privacy suffers from a significant drop in global model accuracy. As a countermeasure, we propose a mixed-precision quantized FL scheme, and we empirically show that both of the issues addressed above can be resolved. In addition, our approach can ensure more robustness as different layers of the deep model are quantized with different precision and quantization modes. We empirically proved the validity of our method with three benchmark datasets and found a minimal accuracy drop in the global model after applying quantization.
- North America > United States > Maryland > Baltimore County (0.05)
- North America > United States > Maryland > Baltimore (0.05)
Quantized Adaptive Subgradient Algorithms and Their Applications
Xu, Ke, Wangni, Jianqiao, Zhang, Yifan, Ye, Deheng, Wu, Jiaxiang, Zhao, Peilin
Data explosion and an increase in model size drive the remarkable advances in large-scale machine learning, but also make model training time-consuming and model storage difficult. To address the above issues in the distributed model training setting which has high computation efficiency and less device limitation, there are still two main difficulties. On one hand, the communication costs for exchanging information, e.g., stochastic gradients among different workers, is a key bottleneck for distributed training efficiency. On the other hand, less parameter model is easy for storage and communication, but the risk of damaging the model performance. To balance the communication costs, model capacity and model performance simultaneously, we propose quantized composite mirror descent adaptive subgradient (QCMD adagrad) and quantized regularized dual average adaptive subgradient (QRDA adagrad) for distributed training. To be specific, we explore the combination of gradient quantization and sparse model to reduce the communication cost per iteration in distributed training. A quantized gradient-based adaptive learning rate matrix is constructed to achieve a balance between communication costs, accuracy, and model sparsity. Moreover, we theoretically find that a large quantization error brings in extra noise, which influences the convergence and sparsity of the model. Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model. Both theoretical analyses and empirical results demonstrate the efficacy and efficiency of the proposed algorithms.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Novikov, Georgii, Bershatsky, Daniel, Gusak, Julia, Shonenkov, Alex, Dimitrov, Denis, Oseledets, Ivan
Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- Asia > Russia (0.05)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (7 more...)
Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients
Sun, Jun, Chen, Tianyi, Giannakis, Georgios B., Yang, Zaiyue
The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication. The key idea is to first quantize the computed gradients, and then skip less informative quantized gradient communications by reusing outdated gradients. Quantizing and skipping result in `lazy' worker-server communications, which justifies the term Lazily Aggregated Quantized gradient that is henceforth abbreviated as LAQ. Our LAQ can provably attain the same linear convergence rate as the gradient descent in the strongly convex case, while effecting major savings in the communication overhead both in transmitted bits as well as in communication rounds. Empirically, experiments with real data corroborate a significant communication reduction compared to existing gradient- and stochastic gradient-based algorithms.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (13 more...)