HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos
Weng, Haoyang, Li, Yitang, Sobanbabu, Nikhil, Wang, Zihan, Luo, Zhengyi, He, Tairan, Ramanan, Deva, Shi, Guanya
–arXiv.org Artificial Intelligence
Figure 1: HDMI enables humanoid robots to acquire diverse whole-body interaction skills directly from human videos. Abstract-- Enabling robust whole-body humanoid-object interaction (HOI) remains challenging due to motion data scarcity and the contact-rich nature. We present HDMI (H umanoiD iM itation for I nteraction), a simple and general framework that learns whole-body humanoid-object interaction skills directly from monocular RGB videos. Our pipeline (i) extracts and retargets human and object trajectories from unconstrained videos to build structured motion datasets, (ii) trains a rein-The authors are with the Robotics Institute, Carnegie Mellon University, USA. Extensive sim-to-real experiments on a Unitree G1 humanoid demonstrate the robustness and generality of our approach: HDMI achieves 67 consecutive door traversals and successfully performs 6 distinct loco-manipulation tasks in the real world and 14 tasks in simulation. Our results establish HDMI as a simple and general framework for acquiring interactive humanoid skills from human videos. Humanoid robots hold immense potential for assisting humans in diverse environments due to their human-like morphology and versatility.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Asia (0.04)
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Genre:
- Research Report > New Finding (0.34)
- Technology: