PARQ: Piecewise-Affine Regularized Quantization

Jin, Lisa, Ma, Jianhao, Liu, Zechun, Gromov, Andrey, Defazio, Aaron, Xiao, Lin

Mar-19-2025–arXiv.org Artificial Intelligence

Modern deep learning models exhibit exceptional vision and language processing capabilities, but come with excessive sizes and demands on memory and computing. Quantization is an effective approach for model compression, which can significantly reduce their memory footprint, computing cost, as well as latency for inference (e.g., Han et al., 2016; Sze et al., 2017). There are two main classes of quantization methods: post-training quantization (PTQ) and quantization-aware training (QAT). Both are widely adopted and receive extensive research--see the recent survey papers (Gholami et al., 2022; Fournarakis et al., 2022) and references therein. PTQ converts the weights of a pre-trained model directly into lower precision without repeating the training pipeline; it thus has less overhead and is relatively easy to apply Nagel et al. (2020); Cai et al. (2020); Chee et al. (2024). However, it is mainly limited to 4 or more bit regimes and can suffer steep performance drops with fewer bits Yao et al. (2022); Dettmers & Zettlemoyer (2023). This is especially the case for transformer-based models, which prove harder to quantize Bai et al. (2021); Qin et al. (2022) compared to convolutional architectures Martinez et al. (2019); Qin et al. (2020). On the other hand, QAT integrates quantization into pre-training and/or fine-tuning processes and can produce low-bit (especially binary) models with mild performance degradation (e.g.

machine learning, natural language, quantization, (18 more...)

arXiv.org Artificial Intelligence

Mar-19-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found