MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Ma, Xiaojian, An, Kaikai, Chen, Liang, Liu, Zixuan, Wang, Sheng, Han, Wenjuan, Chang, Baobao

Oct-2-2023–arXiv.org Artificial Intelligence

Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex multi-modal prompts with multiple images, making VLMs less effective in downstream vision-language tasks. In this paper, we address the limitation above by 1) introducing MMICL, a new approach to allow the VLM to deal with multi-modal inputs efficiently; 2) proposing a novel context scheme to augment the in-context learning ability of the VLM; 3) constructing the Multi-modal In-Context Learning (MIC) dataset, designed to enhance the VLM's ability to understand complex multi-modal prompts. Our experiments confirm that MMICL achieves new state-of-the-art zero-shot performance on a wide range of general vision-language tasks, especially for complex benchmarks, including MME and MMBench. Our analysis demonstrates that MMICL effectively tackles the challenge of complex multi-modal prompt understanding and emerges the impressive ICL ability. Furthermore, we observe that MMICL successfully alleviates language bias in VLMs, a common issue for VLMs that often leads to hallucination when faced with extensive textual context.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-2-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.28)
  - United States (0.28)

Genre:
- Research Report (0.82)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found