GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
Liao, Xutao, Li, Shaohui, Xu, Yuhui, Li, Zhi, Liu, Yu, He, You
–arXiv.org Artificial Intelligence
A variety of parameter-efficient fine-tuning methods have emerged in recent years, enabling an increasing number of institutions and researchers to fine-tune LLMs to meet their specific requirements. Adapters (Rebuffi et al., 2017; Houlsby et al., 2019; Lin et al., 2020; Karimi Mahabadi et al., 2021b;a) enable parameter-efficient fine-tuning by inserting trainable layers into LLMs while keeping other layers frozen. However, this approach also introduces additional inference latency. BitFit (Zaken et al., 2021) selectively tunes only the biases within the network, significantly reducing the number of parameters involved in fine-tuning. Prompt tuning achieves parameter-efficient fine-tuning by optimizing a set of new input tokens or prompts for each task (Li & Liang, 2021; Lester et al., 2021; Hambardzumyan et al., 2021; Liu et al., 2023). Hu et al. (2022) introduced LoRA, proposing that weight updates are low-rank and can be expressed as the product of two low-rank matrices. Furthermore, the trainable parameters can be merged with the original weights, eliminating additional inference latency. Recent studies combined parameter-efficient fine-tuning methods with quantization to enhance memory efficiency during the fine-tuning of LLMs (Kwon et al., 2022; Dettmers et al., 2023; Chai et al., 2023; Xu et al., 2023). And DoRA (Liu et al., 2024), or Weight-Decomposed Low-Rank Adaptation, is a parameterefficient fine-tuning method designed to enhance learning capacity and stability by decomposing pre-trained weights into magnitude and direction components, leveraging LoRA for directional updates, and achieving superior performance across tasks without additional inference costs.
arXiv.org Artificial Intelligence
Dec-15-2024