AITopics | quip

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quan-tizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput.

large language model, machine learning, quantization, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Neural Information Processing SystemsFeb-7-2026, 20:23:22 GMT

We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights being even in magnitude and the

large language model, machine learning, quantization, (19 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Berlin (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

091166620a04a289c555f411d8899049-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 14:56:23 GMT

large language model, machine learning, quantization, (20 more...)

Neural Information Processing Systems

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Neural Information Processing SystemsDec-23-2025, 21:51:25 GMT

We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre-and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight.

2-bit quantization, language model, quip, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm

Yang, Yongyi, Gao, Jianyang, Hu, Wei

arXiv.org Artificial IntelligenceNov-3-2025

Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs). However, existing PTQ methods generally suffer from crucial limitations such as heavy calibration data requirements and inflexible choice of target number of bits. In this paper, we propose RaanA, a unified PTQ framework that overcomes these challenges by introducing two novel components: 1) RaBitQ-H, a variant of a randomized vector quantization method RaBitQ, designed for fast, accurate, and highly efficient quantization; and 2) AllocateBits, an algorithm that optimally allocates bit-widths across layers based on their quantization sensitivity. RaanA achieves competitive performance with state-of-the-art quantization methods while being extremely fast, requiring minimal calibration data, and enabling flexible bit allocation. Extensive experiments demonstrate RaanA's efficacy in balancing efficiency and accuracy. The code is publicly available at https://github.com/FFTYYY/RaanA .

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2504.03717

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Compressing Large Language Models using Low Rank and Low Precision Decomposition

Neural Information Processing SystemsOct-10-2025, 11:52:11 GMT

Due to the correlated nature of language syntax and semantics learned during training, often, the weight matrices of LLMs exhibit redundancy, which manifests as a low-rank structure. This redundancy suggests the potential for compression without substantial loss in performance.

caldera, decomposition, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Government (0.93)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

6de2e84b8da47bb2eb5e2ac96c63d2b0-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 05:25:13 GMT

codebook, qtip, quantization, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Neural Information Processing SystemsOct-9-2025, 18:00:38 GMT

There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices.

algorithm, quantization, representation, (15 more...)

Neural Information Processing Systems

Country: