Understanding the difficulty of low-precision post-training quantization of large language models

Open in new window