Wings: Learning Multimodal LLMs without Text-only Forgetting
–Neural Information Processing Systems
Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, during the continued training, the MLLM catastrophically forgets the text-only instructions that the initial LLM masters.
Neural Information Processing Systems
Dec-24-2025, 23:56:56 GMT
- Technology: