AITopics | ternary quantization

Collaborating Authors

ternary quantization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie

Neural Information Processing SystemsFeb-13-2026, 12:16:39 GMT

Recurrent Neural Networks (RNNs) yield great results across many natural language processing applications, including speech recognition, machine translation, language modeling, and question answering [1,2,3,4,5].

machine learning, natural language, quantization, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tequila: Trapping-free Ternary Quantization for Large Language Models

Huang, Hong, Wu, Decheng, Cen, Rui, Yu, Guanghua, Li, Zonghang, Liu, Kai, Zhu, Jianchen, Chen, Peng, Liu, Xue, Wu, Dapeng

arXiv.org Artificial IntelligenceOct-20-2025

Quantization techniques are essential for the deployment of Large Language Models (LLMs) on edge devices. However, prevailing methods often rely on mixed-precision multiplication that lacks efficient hardware support, making it not feasible. However, such aggressive compression leads to significant accuracy degradation, even after costly quantization-aware training with massive data. We identify the core issue as deadzone trapping: a large number of weights are trapped at the dead-zone boundary. This occurs because these weights receive only noisy, uninformative gradients, preventing stable escape from the deadzone and severely impeding model capacity and optimization. To address this issue, we propose T equila, a trapping-free quantization optimization method that reactivates deadzone-trapped weights by repurposing them as dynamic biases. This allows the repurposed weights to provide a continuous signal in the forward pass and, critically, receive direct, meaningful gradient signals during backpropagation, thereby enhancing model capacity and optimization with nearly zero inference overhead. Extensive evaluations demonstrate that Tequila outperforms state-of-the-art (SOT A) ternary quantization methods across five benchmarks. Specifically, on the ARC benchmark, it achieves > 4% accuracy gain over the SOT A baseline, nearly matching full-precision performance (within < 1% gap) with a 3.0 inference speedup. Consequently, Tequila offers a highly practical and efficient implementation for the deployment of advanced LLMs in resource-constrained environments. Recent advancements in large language models (LLMs) (Wu et al., 2023; Floridi & Chiriatti, 2020; Zhang et al., 2022) have demonstrated remarkable success across a wide range of applications, from conversational chatbots to creative writing.

large language model, machine learning, quantization, (18 more...)

arXiv.org Artificial Intelligence

2509.23809

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Binary and Ternary Quantization Can Enhance Feature Discrimination

Lu, Weizhi, Chen, Mingrui, Li, Weiyu

arXiv.org Artificial IntelligenceJul-14-2025

Quantization is widely applied in machine learning to reduce computational and storage costs for both data and models. Considering that classification tasks are fundamental to the field, it is crucial to investigate how quantization impacts classification performance. Traditional research has focused on quantization errors, assuming that larger errors generally lead to lower classification accuracy. However, this assumption lacks a solid theoretical foundation and often contradicts empirical observations. For example, despite introducing significant errors, $\{0,1\}$-binary and $\{0, \pm1\}$-ternary quantized data have sometimes achieved classification accuracy comparable or even superior to full-precision data. To reasonably explain this phenomenon, a more accurate evaluation of classification performance is required. To achieve this, we propose a direct analysis of the feature discrimination of quantized data, instead of focusing on quantization errors. Our analysis reveals that both binary and ternary quantization can potentially enhance, rather than degrade, the feature discrimination of the original data. This finding is supported by classification experiments conducted on both synthetic and real data.

artificial intelligence, machine learning, quantization, (15 more...)

arXiv.org Artificial Intelligence

2504.13792

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

Steinmetz, Cody, Childress, Gavin, Herbst, Aaron, Jones, Gavin, Singh, Jasdeep, Vang, Eli, Weinstock, Keagan

arXiv.org Artificial IntelligenceMay-15-2025

Large language models (LLMs) have transformed natural-language processing, yet their scale makes real-world deployment costly. Post-training quantization reduces memory and computation but often degrades accuracy, while quantization-aware training can recover performance at the cost of extra training. Pushing quantization to the ternary (2-bit) regime yields even larger savings but is notoriously unstable. Building on recent work showing that a bias-free, RMS-normalized Transformer with straight-through estimation can reach 1.58-bit precision, we demonstrate that simply inserting RMS normalization before every linear projection and applying a gradual, layer-wise quantization schedule stably fine-tunes full-precision checkpoints into ternary LLMs. Our approach matches or surpasses more elaborate knowledge-distillation pipelines on standard language-modeling benchmarks without adding model complexity. These results indicate that careful normalization alone can close much of the accuracy gap between ternary and full-precision LLMs, making ultra-low-bit inference practical.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2505.08823

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Reviews: HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing SystemsOct-7-2024, 16:12:17 GMT

The authors study the problem of quantizing recurrent neural networks. While extreme low bit quantization (2 bits quantization) has achieved strong results for CNN, so far, such quantization performed poorly for recurrent neural network. The goal of this paper is thus to identify the reason for this observation, and to propose extreme quantization scheme better suited for RNNs. First, the authors compare different weight quantization: 2-bits uniform quantization, thresholded ternary quantization (TTQ) and Bernoulli ternary quantization (BTQ). This comparison is performed using a RNN trained on Penn TreeBank.

activation, hybrid ternary recurrent neural network, quantization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Working with Hyperspheres in Machine Learning part2

#artificialintelligenceApr-2-2023, 13:00:12 GMT

Abstract: We consider the reflection of a photon by a two-level system in a quasi-one-dimensional waveguide. This is important in part because it forms the backdrop for more complicated proposals where many emitters are coupled to the waveguide: leading to super and subradiant coupling even when the emitters are distant. The incorporation of chiral effects, for example unidirectional emission of dipole emitters, has already led to rich physics such as dimer coupling. However, chirality is not the only effect of the dipole, as we explore from a phase singularity perspective. We demonstrate that control of the dipole allows a rich variety of control of the phase and amplitude of the scattered light in both directions. This expands the scope for the physics of 1D chains of emitters.

emitter, machine learning part2, ternary quantization, (7 more...)

#artificialintelligence

Genre: Play > Prospect (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Ternary Quantization: A Survey

Liu, Dan, Liu, Xue

arXiv.org Artificial IntelligenceMar-1-2023

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization are mainstream methods to this end. During model quantization, converting individual float values of layer weights to low-precision ones can substantially reduce the computational overhead and improve the inference speed. Many quantization methods have been studied, for example, vector quantization, low-bit quantization, and binary/ternary quantization. This survey focuses on ternary quantization. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods from the perspective of projection function and optimization methods.

artificial intelligence, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2303.01505

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Hyperspherical Loss-Aware Ternary Quantization

Liu, Dan, Liu, Xue

arXiv.org Artificial IntelligenceDec-23-2022

Most of the existing works use projection functions for ternary quantization in discrete space. Scaling factors and thresholds are used in some cases to improve the model accuracy. However, the gradients used for optimization are inaccurate and result in a notable accuracy gap between the full precision and ternary models. To get more accurate gradients, some works gradually increase the discrete portion of the full precision weights in the forward propagation pass, e.g., using temperature-based Sigmoid function. Instead of directly performing ternary quantization in discrete space, we push full precision weights close to ternary ones through regularization term prior to ternary quantization. In addition, inspired by the temperature-based method, we introduce a re-scaling factor to obtain more accurate gradients by simulating the derivatives of Sigmoid function. The experimental results show that our method can significantly improve the accuracy of ternary quantization in both image classification and object detection tasks.

artificial intelligence, machine learning, quantization, (14 more...)

arXiv.org Artificial Intelligence

2212.12649

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Smart Ternary Quantization

Morin, Grégoire, Razani, Ryan, Nia, Vahid Partovi, Sari, Eyyüb

arXiv.org Machine LearningSep-26-2019

Neural network models are resource hungry. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and often beats binary quantization in terms of accuracy, but doubles memory and increases computation cost. Mixed quantization depth models, on another hand, allows a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually (which is a tiring task), or is tuned using a separate optimization routine (which requires training a quantized network multiple times). Here, we propose Smart Ternary Quantization (STQ) in which we modify the quantization depth directly through an adaptive regularization function, so that we train a model only once. This method jumps between binary and ternary quantization while training. We show its application on image classification.

neural network, quantization, regularization function, (14 more...)

arXiv.org Machine Learning

1909.12205

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Deep Neural Network Compression with Single and Multiple Level Quantization

Xu, Yuhui, Wang, Yongzhuang, Zhou, Aojun, Lin, Weiyao, Xiong, Hongkai

arXiv.org Machine LearningMar-5-2018

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Machine Learning

1803.03289

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback