VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Open in new window