WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

Qian, Zezhong, Chi, Xiaowei, Li, Yuming, Wang, Shizun, Qin, Zhiyuan, Ju, Xiaozhu, Han, Sirui, Zhang, Shanghang

Oct-9-2025–arXiv.org Artificial Intelligence

Wrist-view observations are crucial for VLA models as they capture fine-grained hand-object interactions that directly enhance manipulation performance. Y et large-scale datasets rarely include such recordings, resulting in a substantial gap between abundant anchor views and scarce wrist views. Existing world models cannot bridge this gap, as they require a wrist-view first frame and thus fail to generate wrist-view videos from anchor views alone. Amid this gap, recent visual geometry models such as VGGT emerge with precisely the geometric and cross-view priors that make it possible to address such extreme viewpoint shifts. Inspired by these insights, we propose WristWorld, the first 4D world model generates wrist-view videos solely from anchor views. WristWorld operates in two stages: (i) Reconstruction, which extends VGGT and incorporates our Spatial Projection Consistency (SPC) Loss to estimate geometrically consistent wrist-view poses and 4D point clouds; (ii) Generation, which employs our designed video generation model to synthesize temporally coherent wrist-view videos from the reconstructed perspective. Experiments on Droid, Calvin, and Franka Panda demonstrate state-of-the-art video generation with superior spatial consistency, while also improving VLA performance, raising the average task completion length on Calvin by 3.81% and closing 42.4% of the anchor-wrist view gap. The generated wrist observations effectively expanding training data to novel view and lead to significant performance improvements for downstream VLA models across various tasks. Wrist-view observations play a central role in vision-language-action (VLA) models because they directly capture the fine-grained hand-object interactions that underlie precise manipulation.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Beijing > Beijing (0.04)
    - Hong Kong (0.04)
  - Japan > Honshū
    - Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
  - Singapore (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)

Genre:
- Research Report (0.51)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning (0.88)
  - Robots (1.00)
  - Vision (1.00)