An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits

Steinmetz, Cody, Childress, Gavin, Herbst, Aaron, Jones, Gavin, Singh, Jasdeep, Vang, Eli, Weinstock, Keagan

May-15-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have transformed natural-language processing, yet their scale makes real-world deployment costly. Post-training quantization reduces memory and computation but often degrades accuracy, while quantization-aware training can recover performance at the cost of extra training. Pushing quantization to the ternary (2-bit) regime yields even larger savings but is notoriously unstable. Building on recent work showing that a bias-free, RMS-normalized Transformer with straight-through estimation can reach 1.58-bit precision, we demonstrate that simply inserting RMS normalization before every linear projection and applying a gradual, layer-wise quantization schedule stably fine-tunes full-precision checkpoints into ternary LLMs. Our approach matches or surpasses more elaborate knowledge-distillation pipelines on standard language-modeling benchmarks without adding model complexity. These results indicate that careful normalization alone can close much of the accuracy gap between ternary and full-precision LLMs, making ultra-low-bit inference practical.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

May-15-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found