ParetoQ: Improving Scaling Laws in Extremely Low-bit LLMQuantization
–Neural Information Processing Systems
The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous.
Neural Information Processing Systems
Jun-19-2026, 05:12:24 GMT