UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Guo, Xueyang, Hu, Hongwei, Song, Chengye, Chen, Jiale, Zhao, Zilin, Fu, Yu, Guan, Bowen, Liu, Zhenze
–arXiv.org Artificial Intelligence
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-V ocabulary Constrained Grasping with Dual Arms Xueyang Guo 1*, Hongwei Hu 1*, Chengye Song 1, Jiale Chen 1, Zilin Zhao 2, Y u Fu 3, Bowen Guan 4, and Zhenze Liu 1 Abstract -- Open-vocabulary, task-oriented grasping of specific functional parts, particularly with dual arms, remains a key challenge, as current Vision-Language Models (VLMs), while enhancing task understanding, often struggle with precise grasp generation within defined constraints and effective dual-arm coordination. We innovatively propose UniDiffGrasp, a unified framework integrating VLM reasoning with guided part diffusion to address these limitations. UniDiffGrasp leverages a VLM to interpret user input and identify semantic targets (object, part(s), mode), which are then grounded via open-vocabulary segmentation. Critically, the identified parts directly provide geometric constraints for a Constrained Grasp Diffusion Field (CGDF) using its Part-Guided Diffusion, enabling efficient, high-quality 6-DoF grasps without retraining. For dual-arm tasks, UniDiffGrasp defines distinct target regions, applies part-guided diffusion per arm, and selects stable cooperative grasps. Through extensive real-world deployment, UniDiffGrasp achieves grasp success rates of 0.876 in single-arm and 0.767 in dual-arm scenarios, significantly surpassing existing state-of-the-art methods, demonstrating its capability to enable precise and coordinated open-vocabulary grasping in complex real-world scenarios. I. INTRODUCTION The ambition for robots to seamlessly integrate into human environments as capable assistants hinges on their ability to perform dexterous, task-oriented manipulation.
arXiv.org Artificial Intelligence
May-13-2025
- Genre:
- Research Report (0.70)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Machine Learning (1.00)
- Representation & Reasoning > Constraint-Based Reasoning (0.49)
- Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence