Measuring Progress in Fine-grained Vision-and-Language Understanding

Bugliarello, Emanuele, Sartran, Laurent, Agrawal, Aishwarya, Hendricks, Lisa Anne, Nematzadeh, Aida

May-12-2023–arXiv.org Artificial Intelligence

First we consider: Which models perform well Fine-grained multimodal skills (e.g., understanding on fine-grained tasks? To answer this, we evaluate relationships and recognising verbs) require identifying models from four different model families trained and relating various entities across both image with different amounts of pretraining data, as well and text modalities. Vision-and-language models as recent architectures that leverage frozen large (VLMs) need such skills to robustly perform language models (LLMs). We observe that modelling well on real-world vision-and-language (V&L) applications; innovations have more impact than simply e.g., a coarse-grained model tested on scaling image captions from the Web. Furthermore, image retrieval to "find an image where something explicitly modelling localisation can improve is on a sofa" might incorrectly return an image of performance, but it is crucial how it is done, a cat sitting below the sofa. As another example, and simply using localisation data is not enough. in captioning, a model might incorrectly describe Our observations motivate our next question: an image where "someone is selling a sweater" as How do data and losses impact fine-grained understanding? "someone is buying a sweater," if it does not have a We focus our study on the best performing precise understanding of the two verbs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-12-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.67)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found