MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Monajatipoor, Masoud, Li, Liunian Harold, Rouhsedaghat, Mozhdeh, Yang, Lin F., Chang, Kai-Wei

Jun-2-2023–arXiv.org Artificial Intelligence

Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-2-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- Antarctica (0.04)
- South America (0.04)
- North America
  - Costa Rica (0.05)
  - United States > Washington
    - King County > Seattle (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Sports (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found