Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs
Dinesh, Sushil Samuel, Park, Shinkyu
–arXiv.org Artificial Intelligence
This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combining multimodal perception from foundation models with a general-purpose reasoning model capable of robust task sequencing. Scene graphs, dynamically maintained within the framework, provide spatial awareness and enable consistent reasoning about the environment. The framework is evaluated through a series of tabletop robotic manipulation experiments, and the results highlight its potential for building robotic manipulation systems directly on top of off-the-shelf foundation models.
arXiv.org Artificial Intelligence
Nov-3-2025
- Country:
- Asia
- Japan > Shikoku
- Kagawa Prefecture > Takamatsu (0.04)
- Middle East > Saudi Arabia (0.04)
- South Korea > Daegu
- Daegu (0.04)
- Vietnam > Hanoi
- Hanoi (0.05)
- Japan > Shikoku
- North America > United States (0.04)
- Asia
- Genre:
- Research Report (0.82)
- Technology: