Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Zhou, Sashuai, Huang, Hai, Xia, Yan
–arXiv.org Artificial Intelligence
Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters. Parameter-efficient fine-tuning (PEFT) offers a solution by adding small trainable components while freezing pre-trained parameters. However, existing methods primarily focus on uni-modal processing, overlooking the critical modal fusion needed for multi-modal tasks. To fill this gap, we propose heterogeneous mixture of experts adapters that extend the traditional PEFT framework to support multi-modal expert combinations and improve information interaction. Additionally, our approach modifies the affine linear expert design to enable efficient modal fusion in a low-rank space, achieving competitive performance with only 5-8\% of the parameters fine-tuned. Experiments across eight downstream tasks, including visual-audio and text-visual, demonstrate the superior performance of the approach.
arXiv.org Artificial Intelligence
Mar-26-2025
- Country:
- North America > Dominican Republic (0.04)
- Europe
- France (0.04)
- Austria (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Asia
- China (0.05)
- Middle East > Jordan (0.04)
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence