ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
–Neural Information Processing Systems
How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.
Neural Information Processing Systems
Aug-17-2025, 16:26:34 GMT
- Country:
- Asia > China
- Zhejiang Province > Hangzhou (0.04)
- North America > United States
- Washington > King County > Seattle (0.04)
- Asia > China
- Genre:
- Research Report (0.46)
- Technology: