Learning Multimodal LLMs without Text-only Forgetting Yang Li
–Neural Information Processing Systems
Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, during the continued training, the MLLM catastrophically forgets the text-only instructions that the initial LLM masters.
Neural Information Processing Systems
Mar-19-2025, 17:50:23 GMT
- Country:
- North America > United States > Missouri (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.92)
- Technology: