R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization

Chen, Jiayi, Shi, Jieqi, Huo, Jing, Wu, Chen

Dec-1-2025–arXiv.org Artificial Intelligence

The rapid progress of Large Language Models (LLMs) has brought substantial computational and memory demands, spurring the adoption of low-bit quantization. While 8-bit and 4-bit formats have become prevalent, extending quantization to 2 bits remains challenging due to severe accuracy degradation. To address this, we propose Residual Refinement Quantization (R2Q)-a novel 2-bit quantization framework that decomposes the process into two sequential 1-bit sub-quantizations, forming an adaptive quantization lattice. Extensive evaluations on Llama, OPT, and Qwen across diverse benchmarks-covering question answering, commonsense reasoning, and language modeling-demonstrate that R2Q consistently outperforms existing 2-bit quantization methods in both fine-grained and coarse-grained settings. By refining quantization through a residual learning mechanism, R2Q enhances performance, improves training stability, and accelerates convergence under extreme compression. Furthermore, its modular design enables seamless integration with existing quantization-aware training (QAT) frameworks.

large language model, machine learning, quantization, (20 more...)

arXiv.org Artificial Intelligence

Dec-1-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Jiangsu Province > Nanjing (0.04)
- North America > United States
  - Virginia (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)