GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

He, Haoyang, Patrikar, Jay, Kim, Dong-Ki, Smith, Max, McGann, Daniel, Agha-mohammadi, Ali-akbar, Omidshafiei, Shayegan, Scherer, Sebastian

Dec-2-2025–arXiv.org Artificial Intelligence

Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Y et, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and long-horizon stability. W e introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. W e instantiate this framework with Grnd-Ctrl, a reward-aligned adaptation method based on Group Relative Policy Optimization (GRPO), yielding world models that maintain stable trajectories, consistent geometry, and reliable rollouts for embodied navigation. Like post-training alignment in large language models, GrndCtrl leverages verifiable rewards to bridge generative pretrain-ing and grounded behavior, achieving superior spatial coherence and navigation stability over supervised fine-tuning in outdoor environments.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-2-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Cognitive Science > Problem Solving (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found