ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Jan-18-2025, 13:08:57 GMT–Neural Information Processing Systems

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.

efficient and affordable post-training quantization, large-scale transformer, zeroquant, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 13:08:57 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.58)
  - Natural Language > Large Language Model (0.58)