VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

Yin, Shaofeng, Ze, Yanjie, Yu, Hong-Xing, Liu, C. Karen, Wu, Jiajun

Nov-14-2025–arXiv.org Artificial Intelligence

Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .

artificial intelligence, image understanding, tracker, (16 more...)

arXiv.org Artificial Intelligence

Nov-14-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Vision > Image Understanding (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found