PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
–Neural Information Processing Systems
To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank adaptation (LoRA) method approximates the model changes \Delta W \in \mathbb{R} {m \times n} through the product of two matrices A \in \mathbb{R} {m \times r} and B \in \mathbb{R} {r \times n}, where r \ll \min(m, n), A is initialized with Gaussian noise, and B with zeros. LoRA **freezes the original model W ** and **updates the "Noise \& Zero" adapter**, which may lead to slow convergence. To overcome this limitation, we introduce **P**r**i**ncipal **S**ingular values and **S**ingular vectors **A**daptation (PiSSA). PiSSA shares the same architecture as LoRA, but initializes the adaptor matrices A and B with the principal components of the original matrix W, and put the remaining components into a residual matrix W {res} \in \mathbb{R} {m \times n} which is frozen during fine-tuning.Compared to LoRA, PiSSA **updates the principal components** while **freezing the "residual" parts**, allowing faster convergence and enhanced performance. Comparative experiments of PiSSA and LoRA across 11 different models, ranging from 184M to 70B, encompassing 5 NLG and 8 NLU tasks, reveal that PiSSA consistently outperforms LoRA under identical experimental setups.
Neural Information Processing Systems
May-27-2025, 19:03:49 GMT
- Technology: