Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Geigle, Gregor, Timofte, Radu, Glavaš, Goran
–arXiv.org Artificial Intelligence
Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. The bulk of the evaluation of these models is, however, performed with English text only: the costly creation of language-specific image-caption datasets has limited multilingual VL benchmarks to a handful of high-resource languages. In this work, we introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of 1000 ImageNet labels to 92 languages, built without resorting to machine translation (MT) or requiring manual annotation. We instead automatically obtain reliable translations of ImageNext concepts by linking them -- via shared WordNet synsets -- to BabelNet, a massively multilingual lexico-semantic network. We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e.g., German or Chinese), and an even bigger gap for low-resource languages (e.g., Sinhala or Lao). Crucially, we show that the models' ZS-IC performance on Babel-ImageNet highly correlates with their performance in image-text retrieval, validating that Babel-ImageNet is suitable for estimating the quality of the multilingual VL representation spaces for the vast majority of languages that lack gold image-text data. Finally, we show that the performance of multilingual CLIP for low-resource languages can be drastically improved via cheap, parameter-efficient language-specific training. We make our code and data publicly available: \url{https://github.com/gregor-ge/Babel-ImageNet}
arXiv.org Artificial Intelligence
Jun-14-2023
- Country:
- South America
- Suriname > Marowijne District
- Albina (0.04)
- Colombia > Meta Department
- Villavicencio (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Suriname > Marowijne District
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- New Jersey (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Long Beach (0.14)
- Canada > British Columbia
- Europe
- Austria (0.04)
- Belgium (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Germany
- Berlin (0.04)
- Bavaria > Lower Franconia
- Würzburg (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- United Kingdom > England
- Tyne and Wear > Newcastle (0.04)
- France
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Auvergne-Rhône-Alpes > Lyon
- Lyon (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- India (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- South America
- Genre:
- Research Report (1.00)
- Technology: