Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image Y u Zhao
–Neural Information Processing Systems
In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling.
Neural Information Processing Systems
Oct-10-2025, 18:53:08 GMT
- Country:
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Heilongjiang Province > Harbin (0.04)
- Shandong Province > Qingdao (0.04)
- Tianjin Province > Tianjin (0.04)
- Japan > Kyūshū & Okinawa
- Okinawa (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China
- Europe
- Austria (0.04)
- France > Hauts-de-France
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- National Capital Region > Ottawa (0.04)
- Dominican Republic (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Baltimore (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: