QTIP: Quantization with Trellises and Incoherence Processing

Neural Information Processing Systems 

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quan-tizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found