CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
Castro, Santiago, Ziai, Amir, Saluja, Avneesh, Yuan, Zhuoning, Mihalcea, Rada
–arXiv.org Artificial Intelligence
Recent years have witnessed a significant increase in the performance of Vision and Language tasks. Foundational Vision-Language Models (VLMs), such as CLIP, have been leveraged in multiple settings and demonstrated remarkable performance across several tasks. Such models excel at object-centric recognition yet learn text representations that seem invariant to word order, failing to compose known concepts in novel ways. However, no evidence exists that any VLM, including large-scale single-stream models such as GPT-4V, identifies compositions successfully. In this paper, we introduce a framework to significantly improve the ability of existing models to encode compositional language, with over 10% absolute improvement on compositionality benchmarks, while maintaining or improving the performance on standard object-recognition and retrieval benchmarks. Our code and pre-trained models are publicly available at https://github.com/netflix/clove.
arXiv.org Artificial Intelligence
Feb-29-2024
- Country:
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Ireland (0.04)
- Netherlands > South Holland
- Dordrecht (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.14)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Canada
- Oceania > Australia
- South America > Chile
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Services (0.34)
- Leisure & Entertainment (0.34)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.47)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence