3D-Aware Visual Question Answering about Parts, Poses and Occlusions

Jan-19-2025, 20:19:41 GMT–Neural Information Processing Systems

Despite rapid progress in Visual question answering (\textit{VQA}), existing datasets and models mainly focus on testing reasoning in 2D. However, it is important that VQA models also understand the 3D structure of visual scenes, for example to support tasks like navigation or manipulation. This includes an understanding of the 3D object pose, their parts and occlusions. In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes. We address 3D-aware VQA from both the dataset and the model perspective.

3d-aware vqa, pose and occlusion, visual scene, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 20:19:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (0.65)
  - Machine Learning > Neural Networks (0.41)