From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality

Jiang, Shixin, Liang, Jiafeng, Liu, Ming, Qin, Bing

Dec-16-2024–arXiv.org Artificial Intelligence

From the Specific-MLLM, which excels in single-modal tasks, to the Omni-MLLM, which extends the range of general modalities, this evolution aims to achieve understanding and generation of multimodal information. Omni-MLLM treats the features of different modalities as different "foreign languages," enabling cross-modal interaction and understanding within a unified space. To promote the advancement of related research, we have compiled 47 relevant papers to provide the community with a comprehensive introduction to Omni-MLLM. We first explain the four core components of Omni-MLLM for unified modeling and interaction of multiple modalities. Next, we introduce the effective integration achieved through "alignment pretraining" and "instruction fine-tuning," and discuss open-source datasets and testing of interaction capabilities. Finally, we summarize the main challenges facing current Omni-MLLM and outline future directions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-16-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China
  - Heilongjiang Province > Harbin (0.04)
  - Guangdong Province > Shenzhen (0.04)

Genre:
- Overview (0.68)
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)