AITopics | quantized gradient

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.05)
(13 more...)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Neural Information Processing SystemsNov-19-2025, 10:18:45 GMT

We thank all reviewers for their time and effort in reviewing our paper

We thank all reviewers for their time and effort in reviewing our paper. We set up experiments on PyTorch with ResNet18 (He et al., 2016) on CIFAR10 (Krizhevsky, 2009). Figure 1: Evaluations on CIFAR10: training loss ( 1 st column), test accuracy ( 2 nd column) and total number of transmitted bits. MB) on CIFAR10 are shown in Figure 1 above. We will release our code on GitHub in the final version.

artificial intelligence, machine learning, quantizer, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Neural Information Processing SystemsOct-2-2025, 16:53:51 GMT

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

Jun Sun, Tianyi Chen, Georgios Giannakis, Zaiyue Yang

Neural Information Processing Systems http://nips.cc/

artificial intelligence, gradient, machine learning, (17 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.05)
(13 more...)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Neural Information Processing SystemsAug-20-2025, 08:05:47 GMT

We thank all reviewers for their time and effort in reviewing our paper

accuracy, quantizer, time and effort, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Hamidi, Shayan Mohajer, Bereyhi, Ali

Rate-Constrained Quantization for Communication-Efficient Federated Learning

arXiv.org Artificial IntelligenceSep-10-2024

Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from the existing approaches in the literature and develops a novel quantized FL framework, called \textbf{r}ate-\textbf{c}onstrained \textbf{fed}erated learning (RC-FED), in which the gradients are quantized subject to both fidelity and data rate constraints. We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold. This enables for a tunable trade-off between quantization distortion and communication cost. We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.

gradient, quantization, rc-fed, (15 more...)

2409.06319

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJan-30-2024

One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training

Ma, Lianbo, Zhou, Yuee, Ma, Jianlun, Yu, Guo, Li, Qing

Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient. However, we discover that the gradient error will lead to an unexpected zig-zagging-like issue in the gradient descent learning procedures, where the gradient directions rapidly oscillate or zig-zag, and such issue seriously slows down the model convergence. Accordingly, this paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction to defy this issue. During the gradient descent learning, a one-step forward search is designed to find the trial gradient of the next-step, which is adopted to adjust the gradient of current step towards the direction of fast convergence. After that, we backtrack the current step to update the full-precision and quantized weights through the current-step gradient and the trial gradient. A series of theoretical analysis and experiments on benchmark deep models have demonstrated the effectiveness and competitiveness of the proposed method, and our method especially outperforms others on the convergence performance.

neural network, proceedings, quantization, (16 more...)

2401.1676

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Ovi, Pretom Roy, Dey, Emon, Roy, Nirmalya, Gangopadhyay, Aryya

Mixed Precision Quantization to Tackle Gradient Leakage Attacks in Federated Learning

arXiv.org Artificial IntelligenceOct-22-2022

Federated Learning (FL) enables collaborative model building among a large number of participants without the need for explicit data sharing. But this approach shows vulnerabilities when privacy inference attacks are applied to it. In particular, in the event of a gradient leakage attack, which has a higher success rate in retrieving sensitive data from the model gradients, FL models are at higher risk due to the presence of communication in their inherent architecture. The most alarming thing about this gradient leakage attack is that it can be performed in such a covert way that it does not hamper the training performance while the attackers backtrack from the gradients to get information about the raw data. Two of the most common approaches proposed as solutions to this issue are homomorphic encryption and adding noise with differential privacy parameters. These two approaches suffer from two major drawbacks. They are: the key generation process becomes tedious with the increasing number of clients, and noise-based differential privacy suffers from a significant drop in global model accuracy. As a countermeasure, we propose a mixed-precision quantized FL scheme, and we empirically show that both of the issues addressed above can be resolved. In addition, our approach can ensure more robustness as different layers of the deep model are quantized with different precision and quantization modes. We empirically proved the validity of our method with three benchmark datasets and found a minimal accuracy drop in the global model after applying quantization.

artificial intelligence, gradient, machine learning, (12 more...)

2210.13457

Country:

North America > United States > Maryland > Baltimore County (0.05)
North America > United States > Maryland > Baltimore (0.05)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceAug-11-2022

Quantized Adaptive Subgradient Algorithms and Their Applications

Xu, Ke, Wangni, Jianqiao, Zhang, Yifan, Ye, Deheng, Wu, Jiaxiang, Zhao, Peilin

Data explosion and an increase in model size drive the remarkable advances in large-scale machine learning, but also make model training time-consuming and model storage difficult. To address the above issues in the distributed model training setting which has high computation efficiency and less device limitation, there are still two main difficulties. On one hand, the communication costs for exchanging information, e.g., stochastic gradients among different workers, is a key bottleneck for distributed training efficiency. On the other hand, less parameter model is easy for storage and communication, but the risk of damaging the model performance. To balance the communication costs, model capacity and model performance simultaneously, we propose quantized composite mirror descent adaptive subgradient (QCMD adagrad) and quantized regularized dual average adaptive subgradient (QRDA adagrad) for distributed training. To be specific, we explore the combination of gradient quantization and sparse model to reduce the communication cost per iteration in distributed training. A quantized gradient-based adaptive learning rate matrix is constructed to achieve a balance between communication costs, accuracy, and model sparsity. Moreover, we theoretically find that a large quantization error brings in extra noise, which influences the convergence and sparsity of the model. Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model. Both theoretical analyses and empirical results demonstrate the efficacy and efficiency of the proposed algorithms.

adagrad, gradient, qcmd adagrad, (13 more...)

2208.05631

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Novikov, Georgii, Bershatsky, Daniel, Gusak, Julia, Shonenkov, Alex, Dimitrov, Denis, Oseledets, Ivan

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

arXiv.org Artificial IntelligenceFeb-2-2022

Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks.

activation function, approximation, quantized gradient, (12 more...)

2202.00441

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(7 more...)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Sun, Jun, Chen, Tianyi, Giannakis, Georgios B., Yang, Zaiyue

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

arXiv.org Machine LearningSep-17-2019

The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication. The key idea is to first quantize the computed gradients, and then skip less informative quantized gradient communications by reusing outdated gradients. Quantizing and skipping result in `lazy' worker-server communications, which justifies the term Lazily Aggregated Quantized gradient that is henceforth abbreviated as LAQ. Our LAQ can provably attain the same linear convergence rate as the gradient descent in the strongly convex case, while effecting major savings in the communication overhead both in transmitted bits as well as in communication rounds. Empirically, experiments with real data corroborate a significant communication reduction compared to existing gradient- and stochastic gradient-based algorithms.

artificial intelligence, communication, machine learning, (17 more...)

arXiv.org Machine Learning

1909.07588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(13 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)