Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning

Mar-26-2025–arXiv.org Artificial Intelligence

Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters. Parameter-efficient fine-tuning (PEFT) offers a solution by adding small trainable components while freezing pre-trained parameters. However, existing methods primarily focus on uni-modal processing, overlooking the critical modal fusion needed for multi-modal tasks. To fill this gap, we propose heterogeneous mixture of experts adapters that extend the traditional PEFT framework to support multi-modal expert combinations and improve information interaction. Additionally, our approach modifies the affine linear expert design to enable efficient modal fusion in a low-rank space, achieving competitive performance with only 5-8\% of the parameters fine-tuned. Experiments across eight downstream tasks, including visual-audio and text-visual, demonstrate the superior performance of the approach.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-26-2025

arXiv.org PDF

Add feedback

Country:
- North America > Dominican Republic (0.04)
- Europe
  - France (0.04)
  - Austria (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
- Asia
  - China (0.05)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found