c8d606e3fd965d80c1dbfde63744cd2b-Paper-Conference.pdf
–Neural Information Processing Systems
In this paper, we reveal that most current efficient multimodal fine-tuning methods are hindered by a key limitation: they are directly borrowed from LLMs, often neglecting the intrinsic differences of multimodal scenarios and even affecting the full utilization of all modalities. Inspired by our empirical observation, we argue that unimodal adaptation and cross-modal adaptation are two essential parts for the effective fine-tuning of MLLMs. From this perspective, we propose Multimodal low-rank Adaptation (MokA), a multimodal-aware efficient fine-tuning strategy that takes multimodal characteristics into consideration.
Neural Information Processing Systems
Jun-22-2026, 17:59:35 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Overview (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence