ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Dec-24-2025, 01:48:48 GMT–Neural Information Processing Systems

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets/tasks. However, it remains challenging to evaluate the transferablity of these foundation models due to the lack of easy-to-use toolkits for fair benchmarking. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark to compare and evaluate pre-trained language-augmented visual models. Several highlights include: (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.

artificial intelligence, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Dec-24-2025, 01:48:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.40)
  - Vision (0.60)