LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation
Liu, Fangxin, Yang, Ning, Zhao, Junping, Yang, Tao, Guan, Haibing, Jiang, Li
–arXiv.org Artificial Intelligence
Large language models (LLMs) have achieved significant progress in natural language processing but face challenges in deployment due to high memory and computational requirements. Weight quantization is a common approach to address these issues, yet achieving effective low-bit compression remains challenging. This paper presents LCD, which unifies the learning of clustering-based quantization within a knowledge distillation framework. Using carefully designed optimization techniques, LCD preserves LLM performance even at ultra-low bit widths of 2-3 bits. Additionally, LCD compresses activations through smoothing and accelerates inference with a LUT-based design. Experimental results show that LCD outperforms existing methods and delivers up to a 6.2x speedup in inference. Notably, LCD is shown to be more cost-effective, making it a practical solution for real-world applications.
arXiv.org Artificial Intelligence
Jun-17-2025