ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Samin, Ahnaf Mozib, Ahmed, M. Firoz, Rafee, Md. Mushtaq Shahriyar
–arXiv.org Artificial Intelligence
In this benchmark, With the utilization of Transformer architecture, large foils are generated from the existing V&L datasets for each Vision and Language (V&L) models have shown promising of the tasks. A foil is referred to as a distractor or slightly performance in even zero-shot settings. Several studies, incorrect example that is passed along with the correct example however, indicate a lack of robustness of the models when to the V&L model to assess the model's ability to dealing with complex linguistics and visual attributes. In correctly distinguish them [17, 22]. Although the existing this work, we introduce a novel V&L benchmark - Color-V&L benchmarks like VALSE help the community to test Foil, by creating color-related foils to assess the models' the capabilities of V&L models, there is still much work to perception ability to detect colors like red, white, green, etc. be done to evaluate the robustness and generalizability of We evaluate seven state-of-the-art V&L models including the models on numerous other tasks. It remains unknown CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot how well the large V&L models can perceive colors from setting and present intriguing findings from the V&L models.
arXiv.org Artificial Intelligence
May-19-2024
- Country:
- Europe
- Switzerland > Zürich
- Zürich (0.14)
- Netherlands
- North Holland > Amsterdam (0.04)
- Groningen (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > Msida (0.04)
- Switzerland > Zürich
- Asia > Bangladesh
- Sylhet Division > Sylhet District > Sylhet (0.04)
- Europe
- Genre:
- Research Report (0.82)
- Technology: