Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Oct-11-2024, 13:05:47 GMT–Neural Information Processing Systems

Transformer architecture has become the fundamental element of the widespread natural language processing (NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bottleneck for quantization performance. However, their proposed methods increase the computation overhead and still leave the outliers there.

low-bit transformer language model, outlier suppression, zhang, (1 more...)

Neural Information Processing Systems

Oct-11-2024, 13:05:47 GMT

Conferences Web Page

Add feedback

Genre:
- Play > Prospect > Charge (0.30)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)