Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Berrios, William, Mittal, Gautam, Thrush, Tristan, Kiela, Douwe, Singh, Amanpreet

Jun-28-2023–arXiv.org Artificial Intelligence

We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jun-28-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- Asia
  - China (0.04)
  - Philippines (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found