Differentiable Model Compression via Pseudo Quantization Noise

Défossez, Alexandre, Adi, Yossi, Synnaeve, Gabriel

Apr-20-2021–arXiv.org Artificial Intelligence

We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. This method, DiffQ, is differentiable both with respect to the unquantized parameters, and the number of bits used. Given a single hyper-parameter expressing the desired balance between the quantized model size and accuracy, DiffQ can optimize the number of bits used per individual weight or groups of weights, in a single training. We experimentally verify that our method outperforms state-of-the-art quantization techniques on several benchmarks and architectures for image classification, language modeling, and audio source separation. For instance, on the Wikitext-103 language modeling benchmark, DiffQ compresses a 16 layers transformer model by a factor of 8, equivalent to 4 bits precision, while losing only 0.5 points of perplexity. Code is available at: https://github.com/facebookresearch/diffq

accuracy, differentiable model compression, quantization, (12 more...)

arXiv.org Artificial Intelligence

Apr-20-2021

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.04)

Genre:
- Research Report (0.50)

Industry:
- Energy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.87)
  - Vision (0.67)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found