quip
- North America > United States (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Government (0.93)
- Information Technology > Security & Privacy (0.45)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- North America > United States (0.04)
- Europe > United Kingdom > England (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Information Technology (0.45)
- Education (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre-and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight.
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yang, Yongyi, Gao, Jianyang, Hu, Wei
Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs). However, existing PTQ methods generally suffer from crucial limitations such as heavy calibration data requirements and inflexible choice of target number of bits. In this paper, we propose RaanA, a unified PTQ framework that overcomes these challenges by introducing two novel components: 1) RaBitQ-H, a variant of a randomized vector quantization method RaBitQ, designed for fast, accurate, and highly efficient quantization; and 2) AllocateBits, an algorithm that optimally allocates bit-widths across layers based on their quantization sensitivity. RaanA achieves competitive performance with state-of-the-art quantization methods while being extremely fast, requiring minimal calibration data, and enabling flexible bit allocation. Extensive experiments demonstrate RaanA's efficacy in balancing efficiency and accuracy. The code is publicly available at https://github.com/FFTYYY/RaanA .
- North America > United States (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Government (0.93)
- Information Technology > Security & Privacy (0.45)
- North America > United States (0.04)
- Europe > United Kingdom > England (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Information Technology (0.45)
- Education (0.45)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Germany > Berlin (0.04)