BitDelta: Your Fine-Tune May Only Be Worth One Bit

May-26-2025, 17:27:55 GMT–Neural Information Processing Systems

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it is intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. This interesting finding not only highlights the potential redundancy of information added during fine-tuning, but also has significant implications for the multi-tenant serving and multi-tenant storage of fine-tuned models.

bitdelta, large language model, machine learning, (5 more...)

Neural Information Processing Systems

May-26-2025, 17:27:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.85)
  - Natural Language > Large Language Model (0.81)