LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits

Mirzaei, Amir Reza, Wen, Yuqiao, Cao, Yanshuai, Mou, Lili

Nov-10-2025–arXiv.org Artificial Intelligence

Low-Rank Adaptation (LoRA) has become a popular technique for parameter-efficient fine-tuning of large language models (LLMs). In many real-world scenarios, multiple adapters are loaded simultaneously to enable LLM customization for personalized user experiences or to support a diverse range of tasks. Although each adapter is lightweight in isolation, their aggregate cost becomes substantial at scale. This makes it possible to quantize the important components to higher precision, while quantizing the rest to ultra-low bitwidth. We conduct comprehensive experiments with LLaMA 2-7B, LLaMA 2-13B, and Mistral 7B models on mathematical reasoning, coding, and summarization tasks. Large Language Models (LLMs) have achieved remarkable performance across a wide range of natural language tasks (Ouyang et al., 2022; Wang et al., 2022; Zhao et al., 2023), but fine-tuning LLMs for new applications remains computationally and memory intensive. To address this challenge, low-rank adaptation (LoRA; Hu et al., 2022) has emerged as a widely adopted method for parameter-efficient fine-tuning. LoRA introduces small, task-specific low-rank matrices, and during the adaptation, only these low-rank matrices are trained while the base model is frozen. An increasingly important use case of LoRA is LLM customization, as LLM providers (e.g., OpenAI and Google) allow users to personalize their own LLMs (OpenAI, 2025; Google Cloud, 2025).

large language model, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

Nov-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found