OneBit: Towards Extremely Low-bit Large Language Models

Mar-21-2026, 05:41:25 GMT–Neural Information Processing Systems

Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, current quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Mar-21-2026, 05:41:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)