Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Mar-27-2025, 11:33:58 GMT–Neural Information Processing Systems

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling. In this work, we consider modeling the SI2T and ST2I together under a dual learning framework. During the dual framework, we then propose to represent the 3D spatial scene features with a novel 3D scene graph (3DSG) representation that can be shared and beneficial to both tasks.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Mar-27-2025, 11:33:58 GMT

Conferences PDF

Add feedback

Country:
- Asia
  - China (0.67)
  - Middle East > UAE (0.14)
- Europe (1.00)
- North America
  - Canada > Ontario (0.14)
  - United States
    - Hawaii (0.14)
    - Maryland (0.14)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Transportation > Ground > Rail (0.68)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.67)
    - Natural Language > Large Language Model (1.00)
    - Representation & Reasoning > Spatial Reasoning (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found