Goto

Collaborating Authors

 Overview






CVQA: Culturally-diverseMultilingual VisualQuestionAnsweringBenchmark

Neural Information Processing Systems

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data.





RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization

Neural Information Processing Systems

Visual Reinforcement Learning (Visual RL), coupled with high-dimensional observations, has consistently confronted the long-standing challenge of out-of-distribution generalization.