Run LoRA Run: Faster and Lighter LoRA Implementations

Cherniuk, Daria, Mikhalev, Aleksandr, Oseledets, Ivan

Dec-6-2023–arXiv.org Artificial Intelligence

LoRA Hu et al. [2022] paper has introduced low-rank adapters to fine-tune large LLMs on downstream tasks. This approach quickly became popular due to reduced cost of the update. Different modifications of LoRA followed, for example, QLoRA Dettmers et al. [2023] utilizes quantization and further reduces fine-tuning costs, and ReLoRA Lialin et al. [2023] which showed that low-rank updates can also be used for full training. However, all variations of LoRA use the same chain of operations while calculating the output, which often leads to sub-optimal graph of computations. We propose RunLora: a framework which contains different variations of forward and backward pass through an adapter-induced linear layer and chooses the best pair for a given architecture. We evaluated our framework's performance on a series of Llama models and achieved up to 17% speedup only due to optimized chain of PyTorch operations. Additionally, we managed to save up to 4Gb of memory due to reduction in number of saved activations.

faster and lighter lora implementation, large language model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Dec-6-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.17)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.51)
  - Natural Language > Large Language Model (0.36)