Goto

Collaborating Authors

 zeroquant




ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Neural Information Processing Systems

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.




ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Neural Information Processing Systems

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.