MM-LLMs: Recent Advances in MultiModal Large Language Models

Zhang, Duzhen, Yu, Yahan, Li, Chenxing, Dong, Jiahua, Su, Dan, Chu, Chenhui, Yu, Dong

Jan-24-2024–arXiv.org Artificial Intelligence

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Specifically, we first outline general design formulations for model architecture and training pipeline. Subsequently, we provide brief introductions of $26$ existing MM-LLMs, each characterized by its specific formulations. Additionally, we review the performance of MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Lastly, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

arxiv preprint arxiv, mm-llm, wang, (14 more...)

arXiv.org Artificial Intelligence

Jan-24-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Switzerland > Zürich
    - Zürich (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)
  - China > Liaoning Province
    - Shenyang (0.04)

Genre:
- Overview (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)