Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Neural Information Processing Systems 

While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during fine-tuning, the inherent size of pre-trained LLM weights continues to be a pressing concern. Even though quantization techniques are widely proposed to ease memory demands and accelerate LLM inference, most of these techniques are geared towards the deployment phase.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found