DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Zhou, Jiajun, Wu, Jiajun, Gao, Yizhao, Ding, Yuhao, Tao, Chaofan, Li, Boyu, Tu, Fengbin, Cheng, Kwang-Ting, So, Hayden Kwok-Hay, Wong, Ngai

Feb-24-2023–arXiv.org Artificial Intelligence

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model.

artificial intelligence, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

Feb-24-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found