Wings: Learning Multimodal LLMs without Text-only Forgetting

Dec-24-2025, 23:56:56 GMT–Neural Information Processing Systems

Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, during the continued training, the MLLM catastrophically forgets the text-only instructions that the initial LLM masters.

large language model, learning multimodal llm, natural language, (10 more...)

Neural Information Processing Systems

Dec-24-2025, 23:56:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)