SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
–Neural Information Processing Systems
Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities.
Neural Information Processing Systems
Oct-10-2025, 21:26:43 GMT
- Country:
- Europe > France
- Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
- North America > United States
- California > San Diego County > San Diego (0.04)
- South America > Brazil (0.04)
- Europe > France
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology (0.46)
- Technology: