AITopics | zeroquant

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.

efficient and affordable post-training quantization, large-scale transformer, name change, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

A Experimental Details

Neural Information Processing SystemsAug-17-2025, 16:26:37 GMT

The accuracy drop varies a lot under different benchmarks and scenarios.

large language model, machine learning, quantization, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Neural Information Processing SystemsAug-17-2025, 16:26:34 GMT

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.

large language model, machine learning, quantization, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

Add feedback

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Neural Information Processing SystemsJan-18-2025, 13:08:57 GMT

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as \OURS.

efficient and affordable post-training quantization, large-scale transformer, zeroquant, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Collaborating Authors

zeroquant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A.1 Details For BERT, we ordertocapture xmax =0.95xmax+0.05max(xcurrent iterati), xmin =0.95xmin+0.05min(xcurrent iterati)

adf7fa39d65e2983d724ff7da57f00ac-Paper-Conference.pdf

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

A Experimental Details

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers