ColorFoil: Investigating Color Blindness in Large Vision and Language Models

Samin, Ahnaf Mozib, Ahmed, M. Firoz, Rafee, Md. Mushtaq Shahriyar

arXiv.org Artificial Intelligence 

In this benchmark, With the utilization of Transformer architecture, large foils are generated from the existing V&L datasets for each Vision and Language (V&L) models have shown promising of the tasks. A foil is referred to as a distractor or slightly performance in even zero-shot settings. Several studies, incorrect example that is passed along with the correct example however, indicate a lack of robustness of the models when to the V&L model to assess the model's ability to dealing with complex linguistics and visual attributes. In correctly distinguish them [17, 22]. Although the existing this work, we introduce a novel V&L benchmark - Color-V&L benchmarks like VALSE help the community to test Foil, by creating color-related foils to assess the models' the capabilities of V&L models, there is still much work to perception ability to detect colors like red, white, green, etc. be done to evaluate the robustness and generalizability of We evaluate seven state-of-the-art V&L models including the models on numerous other tasks. It remains unknown CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot how well the large V&L models can perceive colors from setting and present intriguing findings from the V&L models.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found