T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping

Fei, Xin, Xu, Zhixuan, Fang, Huaicong, Zhang, Tianrui, Shao, Lin

arXiv.org Artificial Intelligence 

Figure 1: Given object point cloud and hand URDF, T (R,O) Grasp efficiently supports both conditioned and unconditioned grasp synthesis utilizing a graph diffusion model. Compared with D(R,O) Grasp [1], our method achieves superior performance with lower memory usage, significantly higher inference speed and throughput. Abstract-- Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T (R, O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T (R, O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T (R, O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. Grasping with dexterous hands is a fundamental capability for achieving precise, human-level manipulation. Y et, efficiently generating diverse and high-quality grasps remains a longstanding challenge, largely due to the high dimensionality of dexterous hands and the difficulty of ensuring both stability and precision.