Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

May-27-2025, 14:09:06 GMT–Neural Information Processing Systems

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.

householder transformation, matrix, pre-trained vision transformer, (7 more...)

Neural Information Processing Systems

May-27-2025, 14:09:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Vision (0.78)