Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Blumenberg, Patrick, Graave, Thomas, Fingscheidt, Tim
–arXiv.org Artificial Intelligence
Large language models (LLMs) demand extensive memory capacity during both fine-tuning and inference. To enable memory-efficient fine-tuning, existing methods apply block-wise quantization techniques, such as NF4 and AF4, to the network weights. We show that these quantization techniques incur suboptimal quantization errors. Therefore, as a first novelty, we propose an optimization approach for block-wise quantization. Using this method, we design a family of quantizers named 4-bit block-wise optimal float (BOF4), which consistently reduces the quantization error compared to both baseline methods. We provide both a theoretical and a data-driven solution for the optimization process and prove their practical equivalence. Secondly, we propose a modification to the employed normalization method based on the signed absolute block maximum (BOF4-S), enabling further reduction of the quantization error and empirically achieving less degradation in language modeling performance. Thirdly, we explore additional variations of block-wise quantization methods applied to LLMs through an experimental study on the importance of accurately representing zero and large-amplitude weights on the one hand, and optimization towards various error metrics on the other hand. Lastly, we introduce a mixed-precision quantization strategy dubbed outlier-preserving quantization (OPQ) to address the distributional mismatch induced by outlier weights in block-wise quantization. By storing outlier weights in 16-bit precision (OPQ) while applying BOF4-S, we achieve top performance among 4-bit block-wise quantization techniques w.r.t. perplexity.
arXiv.org Artificial Intelligence
May-13-2025
- Country:
- Africa > Rwanda
- Asia > China
- Hong Kong (0.04)
- Europe
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Gulf of Mexico (0.04)
- United States
- California > Santa Clara County
- Santa Clara (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Clara County
- Canada > Ontario
- Genre:
- Research Report > New Finding (0.66)
- Technology: