Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks

May-23-2023–arXiv.org Artificial Intelligence

Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to catch up in size and generality with language-only models. In this work, we ask whether language-only models can be utilised for tasks that require visual input -- but also, as we argue, often require a strong reasoning component. Similar to some recent related work, we make visual information accessible to the language model using separate verbalisation models. Specifically, we investigate the performance of open-source, open-access language models against GPT-3 on five vision-language tasks when given textually-encoded visual information. Our results suggest that language models are effective for solving vision-language tasks even with limited samples. This approach also enhances the interpretability of a model's output by providing a means of tracing the output back through the verbalised image content.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-23-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - Florida > Miami-Dade County
    - Miami (0.14)
  - California
    - Santa Clara County > Palo Alto (0.04)
    - Los Angeles County > Long Beach (0.04)
- Europe
  - Austria (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany
    - Brandenburg > Potsdam (0.04)
    - Berlin (0.04)
- Asia
  - China (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found