GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
He, Haoyang, Patrikar, Jay, Kim, Dong-Ki, Smith, Max, McGann, Daniel, Agha-mohammadi, Ali-akbar, Omidshafiei, Shayegan, Scherer, Sebastian
–arXiv.org Artificial Intelligence
Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Y et, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and long-horizon stability. W e introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. W e instantiate this framework with Grnd-Ctrl, a reward-aligned adaptation method based on Group Relative Policy Optimization (GRPO), yielding world models that maintain stable trajectories, consistent geometry, and reliable rollouts for embodied navigation. Like post-training alignment in large language models, GrndCtrl leverages verifiable rewards to bridge generative pretrain-ing and grounded behavior, achieving superior spatial coherence and navigation stability over supervised fine-tuning in outdoor environments.
arXiv.org Artificial Intelligence
Dec-2-2025
- Genre:
- Research Report (0.64)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning (1.00)
- Cognitive Science > Problem Solving (0.85)
- Information Technology > Artificial Intelligence