Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling
–arXiv.org Artificial Intelligence
Abstract-- T ask and Motion Planning (T AMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic T AMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the T AMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a T AMP solution and backtracks the search based on visual rendering of the states. I. INTRODUCTION Robotic manipulation tasks, such as tabletop manipulations, require reasoning over both symbolic task decisions and continuous geometric feasibility. A robot must decide which action to perform--such as picking, placing, or stacking-- and which object to grasp, which constitutes a discrete search process. Simultaneously, it must determine grasp poses, feasible end-effector configurations, and collision-free motion trajectories governed by continuous constraints. This class of problems is studied under the framework of Task and Motion Planning (i.e., T AMP), which combines high-level task planning with continuous action parameter binding and low-level motion planning [1], [2].
arXiv.org Artificial Intelligence
Oct-31-2025