ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
–Neural Information Processing Systems
How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.
efficient and affordable post-training quantization, large-scale transformer, zeroquant, (4 more...)
Neural Information Processing Systems
Jan-18-2025, 13:08:57 GMT
- Technology: