GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
–Neural Information Processing Systems
Vision Transformers (ViTs) are essential in computer vision but are computationally intensive, too. Model quantization, particularly to low bit-widths like 4-bit, aims to alleviate this difficulty, yet existing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) methods exhibit significant limitations. PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy but suffers from prohibitive computational costs, limited generalization to downstream tasks, training instability, and lacking of open-source codebase. To address these challenges, this paper introduces General, Practical, and Lightning Quantization (GPLQ), a novel framework designed for efficient and effective ViT quantization. GPLQ is founded on two key empirical insights: the paramount importance of activation quantization and the necessity of preserving the model's original optimization basin to maintain generalization. Consequently, GPLQ employs a sequential activation-first, weights-later strategy. Stage 1 keeps weights in FP32 while quantizing activations with a feature mimicking loss in only 1 epoch to keep it stay in the same basin, thereby preserving generalization.
Neural Information Processing Systems
Jun-10-2026, 02:22:51 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)