LOVM: Language-Only Vision Model Selection

Zohar, Orr, Huang, Shih-Cheng, Wang, Kuan-Chieh, Yeung, Serena

Jun-15-2023–arXiv.org Artificial Intelligence

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few- and zero-shot settings. However, selecting the best-performing VLM for some downstream applications is non-trivial, as it is dataset and task-dependent. Meanwhile, the exhaustive evaluation of all available VLMs on a novel application is not only time and computationally demanding but also necessitates the collection of a labeled dataset for evaluation. As the number of open-source VLM variants increases, there is a need for an efficient model selection strategy that does not require access to a curated evaluation dataset. This paper proposes a novel task and benchmark for efficiently evaluating VLMs' zero-shot performance on downstream applications without access to the downstream task dataset. Specifically, we introduce a new task LOVM: Language-Only Vision Model Selection, where methods are expected to perform both model selection and performance prediction based solely on a text description of the desired downstream application. We then introduced an extensive LOVM benchmark consisting of ground-truth evaluations of 35 pre-trained VLMs and 23 datasets, where methods are expected to rank the pre-trained VLMs and predict their zero-shot performance.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jun-15-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology (1.00)
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found