ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Zhao, Zijia, Guo, Longteng, Yue, Tongtian, Chen, Sihan, Shao, Shuai, Zhu, Xinxin, Yuan, Zehuan, Liu, Jing

May-25-2023–arXiv.org Artificial Intelligence

Building general-purpose models that can perceive diverse real-world modalities and solve various tasks is an appealing target in artificial intelligence. In this paper, we present ChatBridge, a novel multimodal language model that leverages the expressive capabilities of language as the catalyst to bridge the gap between various modalities. We show that only language-paired two-modality data is sufficient to connect all modalities. ChatBridge leverages recent large language models (LLM) and extends their zero-shot capabilities to incorporate diverse multimodal inputs. ChatBridge undergoes a two-stage training. The first stage aligns each modality with language, which brings emergent multimodal correlation and collaboration abilities. The second stage instruction-finetunes ChatBridge to align it with user intent with our newly proposed multimodal instruction tuning dataset, named MULTIS, which covers a wide range of 16 multimodal tasks of text, image, video, and audio modalities. We show strong quantitative and qualitative results on zero-shot multimodal tasks covering text, image, video, and audio modalities. All codes, data, and models of ChatBridge will be open-sourced.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-25-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Kentucky (0.04)
- Asia
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
  - China > Heilongjiang Province
    - Daqing (0.04)

Genre:
- Research Report (0.82)
- Personal > Interview (0.46)

Industry:
- Media > Film (1.00)
- Health & Medicine > Consumer Health (0.67)
- Transportation
  - Ground (0.93)
  - Infrastructure & Services (0.93)
- Materials > Chemicals
  - Specialty Chemicals (0.61)
- Leisure & Entertainment
  - Sports > Baseball (0.67)
  - Games (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found