Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Apr-30-2026, 05:24:39 GMT–Neural Information Processing Systems

Transformer models have been widely adopted in various domains over the last years, and especially large language models have advanced the field of AI significantly. Due to their size, the capability of these networks has increased tremendously, but this has come at the cost of a significant increase in necessary compute. Quantization is one of the most effective ways to reduce the computational time and memory consumption of neural networks. Many studies have shown, however, that modern transformer models tend to learn strong outliers in their activations, making them difficult to quantize. To retain acceptable performance, the existence of these outliers requires activations to be in higher bitwidth or the use of different numeric formats, extra fine-tuning, or other workarounds.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Apr-30-2026, 05:24:39 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.92)
- Asia > Middle East (0.28)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Similar Docs Excel Report more

Title	Similarity	Source
None found