AITopics | low-bitwidth training

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsDec-23-2025, 17:51:54 GMT

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.

low-bitwidth training, name change, statistical framework, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsMay-26-2025, 15:37:20 GMT

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.

artificial intelligence, low-bitwidth training, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Review for NeurIPS paper: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsJan-21-2025, 09:32:17 GMT

Summary and Contributions: The authors analyze the effect of gradient quantization for quantized training in a principled fashion, and introduce two methods that reduce the variance of the gradients when doing quantized training. Still I hold that if FQT is compared to QAT, you should quantize the weights and not keep shadow weights. This is what I meant with having the actual weights quantized, and the updates quantized as well. In most FQT applications that are parallelized in compute, you are very often memory movement bound, meaning you're playing a game of reducing memory as much as possible. The gradients are calculated on the fly, used and discarded in the backward pass, the memory overhead of them is small.

activation matrix, low-bitwidth training, statistical framework, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing SystemsOct-9-2024, 11:33:12 GMT

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.

deep neural network, low-bitwidth training, statistical framework, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Filters

Collaborating Authors

low-bitwidth training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Review for NeurIPS paper: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks