The Expressive Power of Low-Rank Adaptation

Zeng, Yuchen, Lee, Kangwook

arXiv.org Machine Learning 

Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We also quantify the approximation error when the LoRArank is lower than the threshold. All our theoretical insights are validated by numerical experiments. Recent foundation models, such as large language models (OpenAI, 2023; Liu et al., 2019; Touvron et al., 2023), have achieved remarkable success in a wide range of applications. Due to their substantial size, the standard full fine-tuning approach--where all the model's parameters are updated for specialized tasks--is becoming increasingly difficult and inefficient. This leads to the growing popularity of parameter-efficient fine-tuning approaches (Hu et al., 2022a; Liu et al., 2022; Ben Zaken et al., 2022; Hu et al., 2022b). Instead of updating all parameters, these approaches selectively update smaller subsets of weights or introduce lightweight adapters, thereby greatly decreasing the computational and storage costs. The most dominant approach along this line is Low-Rank Adaptation (LoRA) (Hu et al., 2022a), which employs lightweight low-rank adapters to pre-trained weight matrices. Far from merely enhancing computational efficiency, empirical evidence has shown that LoRA can match or even exceed the performance of full fine-tuning (Hu et al., 2022a). To date, LoRA has been widely used and achieved considerable success in adapting large language models (Hu et al., 2022a; Dinh et al., 2022b) and image generation models (Ryu, 2023; Fan et al., 2023) for various downstream tasks. Despite the empirical success of LoRA, little is known in theory about how it works. In fact, several crucial theoretical questions remain open, such as: What is the minimum rank of the LoRA adapters required to adapt a (pre-trained) model f to match the functionality of the target model f? How does the model architecture (i.e., depth, width) affect the minimal rank? If the adapter rank is lower than this threshold, what is the resulting approximation error?

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found