OneBit: Towards Extremely Low-bit Large Language Models

May-30-2025, 06:54:02 GMT–Neural Information Processing Systems

Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, current quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models.

large language model, machine learning, quantization, (19 more...)

Neural Information Processing Systems

May-30-2025, 06:54:02 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Education (0.93)
- Information Technology > Software (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (1.00)