VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation

He, Tairan, Wang, Zi, Xue, Haoru, Ben, Qingwei, Luo, Zhengyi, Xiao, Wenli, Yuan, Ye, Da, Xingye, Castañeda, Fernando, Sastry, Shankar, Liu, Changliu, Shi, Guanya, Fan, Linxi, Zhu, Yuke

Dec-1-2025–arXiv.org Artificial Intelligence

A key barrier to the real-world deployment of humanoid robots is the lack of autonomous loco-manipulation skills. W e introduce VIRAL, a visual sim-to-real framework that learns humanoid loco-manipulation entirely in simulation and deploys it zero-shot to real hardware. VIRAL follows a teacher-student design: a privileged RL teacher, operating on full state, learns long-horizon loco-manipulation using a delta action space and reference state initialization. A vision-based student policy is then distilled from the teacher via large-scale simulation with tiled rendering, trained with a mixture of online DAgger and behavior cloning. W e find that compute scale is critical: scaling simulation to tens of GPUs (up to 64) makes both teacher and student training reliable, while low-compute regimes often fail. T o bridge the sim-to-real gap, VIRAL combines large-scale visual domain randomization over lighting, materials, camera parameters, image quality, and sensor delays--with real-to-sim alignment of the dexterous hands and cameras. Deployed on a Unitree G1 humanoid, the resulting RGB-based policy performs continuous loco-manipulation for up to 54 cycles, generalizing to diverse spatial and appearance variations without any real-world fine-tuning, and approaching expert-level teleoperation performance. Extensive ablations dissect the key design choices required to make RGB-based humanoid loco-manipulation work in practice.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Dec-1-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Education (1.00)
- Leisure & Entertainment > Games
  - Computer Games (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.66)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found