ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
–Neural Information Processing Systems
Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets/tasks. However, it remains challenging to evaluate the transferablity of these foundation models due to the lack of easy-to-use toolkits for fair benchmarking. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark to compare and evaluate pre-trained language-augmented visual models. Several highlights include: (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
Neural Information Processing Systems
Dec-24-2025, 01:48:48 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (0.40)
- Vision (0.60)
- Information Technology > Artificial Intelligence