AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Neural Information Processing SystemsFeb-17-2026, 15:44:36 GMT

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Code is available at github.com/spcl/QuaRot .

large language model, machine learning, quantization, (19 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Austria (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsFeb-17-2026, 01:22:36 GMT

9febda1c8344cc5f2d51713964864e93-Paper-Conference.pdf

large language model, machine learning, quantization, (18 more...)

Country:

Asia > China > Hong Kong (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)
(2 more...)

Neural Information Processing SystemsOct-10-2025, 14:05:15 GMT

b5b939436789f76f08b9d0da5e81af7c-Paper-Conference.pdf

matrix, quantization, quarot, (14 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Austria (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsOct-10-2025, 11:39:13 GMT

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs

Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation.

arxiv preprint arxiv, outlier, quantization, (13 more...)

Country:

Asia > China > Hong Kong (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

arXiv.org Artificial IntelligenceSep-17-2025

Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs

Guo, Hang, Li, Yawei, Benini, Luca

Recent advances in Large Language Model (LLM) compression, such as quantization and pruning, have achieved notable success. However, as these techniques gradually approach their respective limits, relying on a single method for further compression has become increasingly challenging. In this work, we explore an alternative solution by combining quantization and sparsity. This joint approach, though promising, introduces new difficulties due to the inherently conflicting requirements on weight distributions: quantization favors compact ranges, while pruning benefits from high variance. To attack this problem, we propose Optimal Brain Restoration (OBR), a general and training-free framework that aligns pruning and quantization by error compensation between both. OBR minimizes performance degradation on downstream tasks by building on a second-order Hessian objective, which is then reformulated into a tractable problem through surrogate approximation and ultimately reaches a closed-form solution via group error compensation. Experiments show that OBR enables aggressive W4A4KV4 quantization with 50% sparsity on existing LLMs, and delivers up to 4.72x speedup and 6.4x memory reduction compared to the FP16-dense baseline.

large language model, machine learning, quantization, (18 more...)

2509.11177

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

arXiv.org Artificial IntelligenceAug-15-2025

Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

Choi, Euntae, Song, Sumin, Lim, Woosang, Yoo, Sungjoo

Large Language Models (LLMs) face deployment challenges due to high computational costs, and while Post-Training Quantization (PTQ) offers a solution, existing rotation-based methods struggle at very low bit-widths like 2-bit. We introduce a novel, training-free approach to construct an improved rotation matrix, addressing the limitations of current methods. The key contributions include leveraging the Walsh-Hadamard transform with sequency ordering, which clusters similar frequency components to reduce quantization error compared to standard Hadamard matrices, significantly improving performance. Furthermore, we propose a Grouped Sequency-arranged Rotation (GSR) using block-diagonal matrices with smaller Walsh blocks, effectively isolating outlier impacts and achieving performance comparable to optimization-based methods without requiring any training. Our method demonstrates robust performance on reasoning tasks and Perplexity (PPL) score on WikiText-2. Our method also enhances results even when applied over existing learned rotation techniques.

artificial intelligence, large language model, natural language, (16 more...)

2505.0381

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Czakó, Patrik, Kertész, Gábor, Szénási, Sándor

SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs

arXiv.org Artificial IntelligenceJul-30-2025

--We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs). SmoothRot addresses the critical challenge of massive activation outliers, by integrating channel-wise scaling with Hadamard transformations. Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy. Experiments conducted on popular LLMs (LLaMA2 7B, LLaMA3.1 8B, and Mistral 7B) demonstrate that SmoothRot consistently reduces the performance gap between quantized and FP16 models by approximately 10-30% across language generation and zero-shot reasoning tasks, without introducing additional inference latency. Large Language Models (LLMs) [1]-[3] have shown remarkable capabilities in natural language processing, becoming central to many artificial intelligence applications. However the rapid increase in models sizes required to achieve these impressive results has significantly raised their training and inference costs in terms of time, memory and energy consumption compared to smaller models [4].

large language model, machine learning, natural language, (18 more...)

2506.05413

Country: Europe > Hungary (0.16)

Genre: Research Report (1.00)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neural Information Processing SystemsMay-27-2025, 13:41:40 GMT

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

outlier-free 4-bit inference, quarot, rotated llm, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Maisonnave, Lucas, Moineau, Cyril, Bichler, Olivier, Rastello, Fabrice

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

arXiv.org Artificial IntelligenceMay-14-2025

Large language models (LLMs) have become pivotal in artificial intelligence, demonstrating strong capabilities in reasoning, understanding, and generating data. However, their deployment on edge devices is hindered by their substantial size, often reaching several billion parameters. Quantization is a widely used method to reduce memory usage and inference time, however LLMs present unique challenges due to the prevalence of outliers in their activations. In this work, we leverage the theoretical advantages of Hadamard matrices over random rotation matrices to push the boundaries of quantization in LLMs. We demonstrate that Hadamard matrices are more effective in reducing outliers, which are a significant obstacle in achieving low-bit quantization. Our method based on a gradual binary search enables 3-bit quantization for weights, activations, and key-value (KV) caches, resulting in a 40% increase in accuracy on common benchmarks compared to SoTA methods. We extend the use of rotation matrices to support non-power-of-2 embedding dimensions, similar to the Qwen architecture, by employing the Paley algorithm. We theoretically demonstrates the superiority of Hadamard matrices in reducing outliers.We achieved 3-bit quantization for weights, activations, and KV cache, significantly enhancing model performance. Our experimental results on multiple models family like Mistral, LLaMA, and Qwen demonstrate the effectiveness of our approach, outperforming existing methods and enabling practical 3-bit quantization.

artificial intelligence, large language model, natural language, (17 more...)

2504.13989

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)