SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

Oct-10-2025, 21:26:43 GMT–Neural Information Processing Systems

Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities.

arxiv preprint arxiv, dataset, spatialrgpt, (14 more...)

Neural Information Processing Systems

Oct-10-2025, 21:26:43 GMT

Conferences PDF

Add feedback

Country:
- South America > Brazil (0.04)
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Europe > France
  - Bourgogne-Franche-Comté > Doubs > Besançon (0.04)

Genre:
- Research Report
  - New Finding (0.93)
  - Experimental Study (0.93)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks (0.93)
  - Representation & Reasoning > Spatial Reasoning (0.84)

Duplicate Docs Excel Report

Title
SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

Similar Docs Excel Report more

Title	Similarity	Source
None found