Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping

Huang, Ziye, Yuan, Haoqi, Fu, Yuhui, Lu, Zongqing

arXiv.org Artificial Intelligence 

E FFICIENT R ESIDUAL L EARNING WITH M IXTURE-OF-E XPERTS FOR U NIVERSAL D EXTEROUSG RASPING Ziye Huang 1, Haoqi Y uan 1, Y uhui Fu 1, Zongqing Lu 1,2 1 Peking University 2 Beijing Academy of Artificial Intelligence A BSTRACT Universal dexterous grasping across diverse objects presents a fundamental yet formidable challenge in robot learning. Existing approaches using reinforcement learning (RL) to develop policies on extensive object datasets face critical limitations, including complex curriculum design for multi-task learning and limited generalization to unseen objects. To overcome these challenges, we introduce ResDex, a novel approach that integrates residual policy learning with a mixture-of-experts (MoE) framework. ResDex is distinguished by its use of geometry-unaware base policies that are efficiently acquired on individual objects and capable of generalizing across a wide range of unseen objects. Our MoE framework incorporates several base policies to facilitate diverse grasping styles suitable for various objects. By learning residual actions alongside weights that combine these base policies, ResDex enables efficient multi-task RL for universal dexterous grasping. ResDex achieves state-of-the-art performance on the DexGraspNet dataset comprising 3,200 objects with an 88.8% success rate. It exhibits no generalization gap with unseen objects and demonstrates superior training efficiency, mastering all tasks within only 12 hours on a single GPU. 1 I NTRODUCTION Dexterous robotic hands (Pons et al., 1999; Shaw et al., 2023) provide advanced capabilities for complex grasping tasks, similar to those performed by human hands. However, achieving universal dexterous grasping across a wide range of objects remains a significant challenge due to the high degrees of freedom (DoFs) for dexterous hands and the high variability in object geometry in the real world. Previous works (Qin et al., 2022a; Agarwal et al., 2023) develop dexterous grasping policies using reinforcement learning (RL), but these policies are limited to a small range of objects that are similar to the training objects.