A probabilistic framework for dynamic quantization
Santini, Gabriele, Paissan, Francesco, Farella, Elisabetta
–arXiv.org Artificial Intelligence
We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.
arXiv.org Artificial Intelligence
May-19-2025
- Country:
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- Genre:
- Research Report (1.00)
- Technology: