CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark David Romero

Neural Information Processing Systems 

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found