AITopics | quantized model

1dba3025b159cd9354da65e2d0436a31-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:29:07 GMT

artificial intelligence, machine learning, quantization, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Neural Information Processing SystemsMar-21-2026, 13:15:10 GMT

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to $1.99$ bits, achieving a model with $7.9\times$ smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Exploiting LLM Quantization

Neural Information Processing SystemsMar-20-2026, 07:12:17 GMT

Quantization leverages lower-precision weights to reduce the memory usage of large language models (LLMs) and is a key technique for enabling their deployment on commodity hardware. While LLM quantization's impact on utility has been extensively explored, this work for the first time studies its adverse effects from a security perspective. We reveal that widely used quantization methods can be exploited to produce a harmful quantized LLM, even though the full-precision counterpart appears benign, potentially tricking users into deploying the malicious quantized model. We demonstrate this threat using a three-staged attack framework: (i) first, we obtain a malicious LLM through fine-tuning on an adversarial task; (ii) next, we quantize the malicious model and calculate constraints that characterize all full-precision models that map to the same quantized model; (iii) finally, using projected gradient descent, we tune out the poisoned behavior from the full-precision model while ensuring that its weights satisfy the constraints computed in step (ii). This procedure results in an LLM that exhibits benign behavior in full precision but when quantized, it follows the adversarial behavior injected in step (i). We experimentally demonstrate the feasibility and severity of such an attack across three diverse scenarios: vulnerable code generation, content injection, and over-refusal attack. In practice, the adversary could host the resulting full-precision model on an LLM community hub such as Hugging Face, exposing millions of users to the threat of deploying its malicious quantized version on their devices.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing SystemsMar-16-2026, 23:00:27 GMT

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e0e956681b04ac126679e8c7dd706b2e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 10:41:31 GMT

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Leisure & Entertainment (0.93)
Education (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
(2 more...)

Add feedback

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Y ang Sui 1,2, Y anyu Li

Neural Information Processing SystemsFeb-16-2026, 12:57:51 GMT

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.04)
North America > United States > Texas (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.92)
Media > Photography (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Neural Information Processing SystemsFeb-16-2026, 06:16:00 GMT

In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.

artificial intelligence, machine learning, quantization, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

7a7a3f53faafc0161be0fcb57e5fa078-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 00:44:15 GMT

large language model, machine learning, quantization, (21 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
North America > United States > Minnesota (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.93)
Information Technology > Software (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(2 more...)

Add feedback

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie

Neural Information Processing SystemsFeb-13-2026, 12:16:39 GMT

Recurrent Neural Networks (RNNs) yield great results across many natural language processing applications, including speech recognition, machine translation, language modeling, and question answering [1,2,3,4,5].

machine learning, natural language, quantization, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploiting LLM Quantization

Neural Information Processing SystemsFeb-12-2026, 11:38:18 GMT

While LLM quantization's impact on utility has been extensively explored, this work for the first time studies its adverse effects from a security perspective.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: