MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Fang, Shaoheng, Yu, Chaohui, Wang, Fan, Huang, Qixing

Dec-5-2025–arXiv.org Artificial Intelligence

W e introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MV-Room employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively.

artificial intelligence, layout, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Dec-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Yunnan Province > Kunming (0.04)
- Europe > Italy
  - Lombardy > Milan (0.04)
- North America > United States
  - District of Columbia > Washington (0.05)
  - Texas > Travis County
    - Austin (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Human Computer Interaction > Interfaces
    - Virtual Reality (0.46)