LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation

Liu, Fangxin, Yang, Ning, Zhao, Junping, Yang, Tao, Guan, Haibing, Jiang, Li

Jun-17-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have achieved significant progress in natural language processing but face challenges in deployment due to high memory and computational requirements. Weight quantization is a common approach to address these issues, yet achieving effective low-bit compression remains challenging. This paper presents LCD, which unifies the learning of clustering-based quantization within a knowledge distillation framework. Using carefully designed optimization techniques, LCD preserves LLM performance even at ultra-low bit widths of 2-3 bits. Additionally, LCD compresses activations through smoothing and accelerates inference with a LUT-based design. Experimental results show that LCD outperforms existing methods and delivers up to a 6.2x speedup in inference. Notably, LCD is shown to be more cost-effective, making it a practical solution for real-world applications.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

Jun-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found