ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Du, Sinan, Zhang, Guosheng, Wang, Keyao, Wang, Yuanrui, Yue, Haixiao, Zhang, Gang, Ding, Errui, Wang, Jingdong, Xu, Zhengzhuo, Yuan, Chun

Dec-11-2024–arXiv.org Artificial Intelligence

Parameter-efficient transfer learning (PETL) has become a promising paradigm for adapting large-scale vision foundation models to downstream tasks. Typical methods primarily leverage the intrinsic low rank property to make decomposition, learning task-specific weights while compressing parameter size. However, such approaches predominantly manipulate within the original feature space utilizing a single-branch structure, which might be suboptimal for decoupling the learned representations and patterns. In this paper, we propose ALoRE, a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts using a multi-branch paradigm, disentangling the learned cognitive patterns during training. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone via re-parameterization in a sequential manner, avoiding additional inference latency. We conduct extensive experiments on 24 image classification tasks using various backbone variants. Experimental results demonstrate that ALoRE outperforms the full fine-tuning strategy and other state-of-the-art PETL methods in terms of performance and parameter efficiency. For instance, ALoRE obtains 3.06% and 9.97% Top-1 accuracy improvement on average compared to full fine-tuning on the FGVC datasets and VTAB-1k benchmark by only updating 0.15M parameters.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Dec-11-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Spain (0.14)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.92)
    - Natural Language > Large Language Model (0.67)
    - Vision > Image Understanding (0.66)
  - Sensing and Signal Processing > Image Processing (1.00)