H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Jan-20-2025, 01:31:01 GMT–Neural Information Processing Systems

Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. Our framework consists of three stages: \textit{(i)} pre-training representations with 3D human hand pose estimation, \textit{(ii)} offline adapting representations with self-supervised keypoint detection, and \textit{(iii)} reinforcement learning with exponential moving average BatchNorm. The last two stages only modify 0.36 % parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study \textbf{12} challenging dexterous manipulation tasks and find that \textbf{H-InDex} largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code and videos are available at https://yanjieze.com/H-InDex .

h-index, textbf, visual reinforcement learning, (4 more...)

Neural Information Processing Systems

Jan-20-2025, 01:31:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.71)
  - Robots > Manipulation (0.65)