CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Kalischek, Nikolai, Oechsle, Michael, Manhardt, Fabian, Henzler, Philipp, Schindler, Konrad, Tombari, Federico

Jan-28-2025–arXiv.org Artificial Intelligence

We introduce a novel method for generating 360{\deg} panoramas from text prompts or images. Our approach leverages recent advances in 3D generation by employing multi-view diffusion models to jointly synthesize the six faces of a cubemap. Unlike previous methods that rely on processing equirectangular projections or autoregressive generation, our method treats each face as a standard perspective image, simplifying the generation process and enabling the use of existing multi-view diffusion models. We demonstrate that these models can be adapted to produce high-quality cubemaps without requiring correspondence-aware attention layers. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set, whilst achieving state-of-the-art results, both qualitatively and quantitatively. Project page: https://cubediff.github.io/

machine learning, natural language, panorama, (16 more...)

arXiv.org Artificial Intelligence

Jan-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.68)
  - Natural Language (1.00)
  - Vision (1.00)