Q-VLM: Post-training Quantization for Large Vision-Language Models
–Neural Information Processing Systems
In this paper, we propose a post-training quantization framework of large vision-language models (L VLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by minimizing activation discretization errors, which fails to acquire optimal quantization strategy without considering cross-layer dependency.
Neural Information Processing Systems
Feb-18-2026, 05:41:32 GMT