Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
–Neural Information Processing Systems
In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling. In this work, we consider modeling the SI2T and ST2I together under a dual learning framework. During the dual framework, we then propose to represent the 3D spatial scene features with a novel 3D scene graph (3DSG) representation that can be shared and beneficial to both tasks.
Neural Information Processing Systems
Mar-27-2025, 11:33:58 GMT
- Country:
- Asia
- China (0.67)
- Middle East > UAE (0.14)
- Europe (1.00)
- North America
- Canada > Ontario (0.14)
- United States
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Transportation > Ground > Rail (0.68)
- Technology: