Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Wu, Haoyu, Wu, Diankun, He, Tianyu, Guo, Junliang, Ye, Yang, Duan, Yueqi, Bian, Jiang

Jul-11-2025–arXiv.org Artificial Intelligence

Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned representations. To bridge this gap between video diffusion models and the underlying 3D nature of the physical world, we propose Geometry Forcing, a simple yet effective method that encourages video diffusion models to internalize latent 3D representations. Our key insight is to guide the model's intermediate representations toward geometry-aware structure by aligning them with features from a pretrained geometric foundation model. To this end, we introduce two complementary alignment objectives: Angular Alignment, which enforces directional consistency via cosine similarity, and Scale Alignment, which preserves scale-related information by regressing unnormalized geometric features from normalized diffusion representation. We evaluate Geometry Forcing on both camera view-conditioned and action-conditioned video generation tasks. Experimental results demonstrate that our method substantially improves visual quality and 3D consistency over the baseline methods. Project page: https://GeometryForcing.github.io.

diffusion model, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jul-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū
    - Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
  - Middle East > Saudi Arabia
    - Northern Borders Province > Arar (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language > Large Language Model (0.68)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found