SOFAR: Language-Grounded Orientation Bridges Spatial Reasoningand Object Manipulation
–Neural Information Processing Systems
While spatial reasoning has made progress in object localization relationships, it often overlooks object orientation--a key factor in 6-DoF fine-grained manipulation. Traditional pose representations rely on pre-defined frames or templates, limiting generalization and semantic grounding. In this paper, we introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner (e.g., the "plug-in" direction of a USB or the "handle" direction of a cup). To support this, we construct OrienText300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop PointSO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SOFAR framework enables 6-DoF spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SOFAR, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Neural Information Processing Systems
Jun-18-2026, 07:22:28 GMT
- Country:
- North America > United States (1.00)
- Europe (1.00)
- Asia (1.00)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.87)
- Research Report
- Industry:
- Leisure & Entertainment (0.67)
- Information Technology (0.46)
- Technology: