OneBit: Towards Extremely Low-bit Large Language Models
–Neural Information Processing Systems
Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, current quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models.
Neural Information Processing Systems
May-30-2025, 06:54:02 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education (0.93)
- Information Technology > Software (0.46)
- Technology: