Goto

Collaborating Authors

 scene graph



SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

Neural Information Processing Systems

Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities.


Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image Y u Zhao

Neural Information Processing Systems

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling.



4D Panoptic Scene Graph Generation Jingkang Y ang

Neural Information Processing Systems

Traditional 3D scene graph methods may recognize the static elements of this scene, such as identifying a booth situated on the ground. However, a more ideal, advanced, and dynamic perception is required for real-world scenarios.