How to Teach Large Multimodal Models New Skills

Zhu, Zhen, Gong, Yiming, Xiao, Yao, Liu, Yaoyao, Hoiem, Derek

Oct-10-2025–arXiv.org Artificial Intelligence

How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after narrow fine-tuning can partly recover at later stages. We trace this behavior to a measurable shift in the output token distribution, manifested through a simple counting-bias probe that co-varies with forgetting. Guided by this picture, we identify two simple, robust tuning recipes that learn strongly while limiting drift: (i) updating only the self-attention projection layers, and (ii) updating only the MLP Gate&Up while freezing the Down projection. Across models and tasks, these choices deliver strong target gains while largely preserving held-out performance. Code is available at https://github.com/jessemelpolio/LMM_CL

machine learning, natural language, target task, (16 more...)

arXiv.org Artificial Intelligence

Oct-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)
- Health & Medicine (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.92)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (1.00)
    - Vision (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found