H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
–Neural Information Processing Systems
Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. Our framework consists of three stages: \textit{(i)} pre-training representations with 3D human hand pose estimation, \textit{(ii)} offline adapting representations with self-supervised keypoint detection, and \textit{(iii)} reinforcement learning with exponential moving average BatchNorm. The last two stages only modify 0.36 % parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study \textbf{12} challenging dexterous manipulation tasks and find that \textbf{H-InDex} largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code and videos are available at https://yanjieze.com/H-InDex .
Neural Information Processing Systems
Jan-20-2025, 01:31:01 GMT
- Technology: