OneBit: Towards Extremely Low-bit Large Language Models
–Neural Information Processing Systems
Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, current quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models.
Neural Information Processing Systems
Mar-21-2026, 05:41:25 GMT
- Technology: