CVQA: Culturally-diverseMultilingual VisualQuestionAnsweringBenchmark

Neural Information Processing Systems 

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found