Humanoid World Models: Open World Foundation Models for Humanoid Robotics
Ali, Muhammad Qasim, Sridhar, Aditya, Matiana, Shahbuland, Wong, Alex, Al-Sharman, Mohammad
–arXiv.org Artificial Intelligence
Humanoid robots, with their human-like form, are uniquely suited for interacting in environments built for people. However, enabling humanoids to reason, plan, and act in complex open-world settings remains a challenge. World models, models that predict the future outcome of a given action, can support these capabilities by serving as a dynamics model in long-horizon planning and generating synthetic data for policy learning. We introduce Humanoid World Models (HWM), a family of lightweight, open-source models that forecast future egocentric video conditioned on humanoid control tokens. We train two types of generative models, Masked Transformers and Flow-Matching, on 100 hours of humanoid demonstrations. Additionally, we explore architectural variants with different attention mechanisms and parameter-sharing strategies. Our parameter-sharing techniques reduce model size by 33-53% with minimal impact on performance or visual fidelity. HWMs are designed to be trained and deployed in practical academic and small-lab settings, such as 1-2 GPUs.
arXiv.org Artificial Intelligence
Jul-10-2025
- Country:
- North America (0.28)
- Genre:
- Research Report (0.55)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Representation & Reasoning (1.00)
- Machine Learning > Neural Networks (1.00)
- Cognitive Science > Problem Solving (0.93)
- Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence