RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Liu, Haichao, Guo, Sikai, Mai, Pengfei, Cao, Jiahang, Li, Haoang, Ma, Jun

Mar-3-2025–arXiv.org Artificial Intelligence

-- This paper introduces RoboDexVLM, an innovative framework for robot task planning and grasp detection tailored for a collaborative manipulator equipped with a dexterous hand. Previous methods focus on simplified and limited manipulation tasks, which often neglect the complexities associated with grasping a diverse array of objects in a long-horizon manner . In contrast, our proposed framework utilizes a dexterous hand capable of grasping objects of varying shapes and sizes while executing tasks based on natural language commands. The proposed approach has the following core components: First, a robust task planner with a task-level recovery mechanism that leverages vision-language models (VLMs) is designed, which enables the system to interpret and execute open-vocabulary commands for long sequence tasks. Second, a language-guided dexterous grasp perception algorithm is presented based on robot kinematics and formal methods, tailored for zero-shot dexterous manipulation with diverse objects and commands. These results highlight the framework's ability to operate in complex environments, showcasing its potential for open-vocabulary dexterous manipulation. Robotic manipulation has become a cornerstone of modern technological progress, driving advancements in manufacturing, healthcare, and domestic automation. By bridging perception, reasoning, and physical interaction, these systems enhance productivity, enable safe operation in hazardous environments, and address critical societal challenges such as labor shortages.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.89)
  - Robots > Manipulation (1.00)