Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Sep-3-2025–arXiv.org Artificial Intelligence

-- Recent robotic task planning frameworks have integrated large multimodal models (LMMs) such as GPT - 4o. T o address grounding issues of such models, it has been suggested to split the pipeline into perceptional state grounding and subsequent state-based planning. As we show in this work, the state grounding ability of LMM-based approaches is still limited by weaknesses in granular, structured, domain-specific scene understanding. T o address this shortcoming, we develop a more structured state grounding framework that features a domain-conditioned scene graph as its scene representation. We show that such representation is actionable in nature as it is directly mappable to a symbolic state in planning languages such as the Planning Domain Definition Language (PDDL). We provide an instantiation of our state grounding framework where the domain-conditioned scene graph generation is implemented with a lightweight vision-language approach that classifies domain-specific predicates on top of domain-relevant object detections. Evaluated across three domains, our approach achieves significantly higher state grounding accuracy and task planning success rates compared to LMM-based approaches. I. INTRODUCTION Task planning in a real environment relies on two core capabilities: (a) reasoning to find an action plan that fulfills the goal, and (b) scene understanding to accurately recognize the state of the environment [1]. Traditionally, these capabilities had to be learned through in-domain training, which resulted in models that could only perform well within specific tasks, objects, or environments.

artificial intelligence, natural language, planning & scheduling, (18 more...)

arXiv.org Artificial Intelligence

Sep-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.15)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Robot Planning & Action (1.00)
  - Representation & Reasoning > Planning & Scheduling (1.00)
  - Natural Language (1.00)