AITopics | mvdiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Shitao T ang

Neural Information Processing SystemsFeb-16-2026, 05:42:53 GMT

MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

Neural Information Processing SystemsDec-26-2025, 11:12:29 GMT

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh. The project page is at https://mvdiffusion.github.io/.

correspondence-aware diffusion, enabling holistic multi-view image generation, mvdiffusion, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a0da690a47b2f52faa63f6fe054057b5-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 03:06:52 GMT

arxiv preprint arxiv, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.99)

Add feedback

StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces

Yeo, Kyeongmin, Kim, Jaihoon, Sung, Minhyuk

arXiv.org Artificial IntelligenceJan-26-2025

Figure 1: Assorted mesh textures and panoramas generated using StochSync, including one in the background (environment map), which is a 360 panorama. StochSync extends the capabilities of image diffusion models trained in square spaces to produce images in arbitrary spaces such as cylinders, spheres, tori, and mesh surfaces. We propose a zero-shot method for generating images in arbitrary spaces (e.g., a sphere for 360 The zero-shot generation of various visual content using a pretrained image diffusion model has been explored mainly in two directions. First, Diffusion Synchronization-performing reverse diffusion processes jointly across different projected spaces while synchronizing them in the target space-generates high-quality outputs when enough conditioning is provided, but it struggles in its absence. Second, Score Distillation Sampling-gradually updating the target space data through gradient descent-results in better coherence but often lacks detail. In this paper, we reveal for the first time the interconnection between these two methods while highlighting their differences. To this end, we propose StochSync, a novel approach that combines the strengths of both, enabling effective performance with weak conditioning. Project page is at https: //stochsync.github.io/. Diffusion models pretrained on billions of images (Rombach et al., 2022; Midjourney) have demonstrated remarkable capabilities in various zero-shot applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.15445

Country:

South America > Bolivia (0.04)
North America > United States > Kansas (0.04)
Europe > Italy > Tuscany (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)

Add feedback

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

Neural Information Processing SystemsJan-19-2025, 17:16:03 GMT

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh.

correspondence-aware diffusion, enabling holistic multi-view image generation, mvdiffusion, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Multi-view Image Diffusion via Coordinate Noise and Fourier Attention

Theiss, Justin, Müller, Norman, Kim, Daeil, Prakash, Aayush

arXiv.org Artificial IntelligenceDec-4-2024

Recently, text-to-image generation with diffusion models has made significant advancements in both higher fidelity and generalization capabilities compared to previous baselines. However, generating holistic multi-view consistent images from prompts still remains an important and challenging task. To address this challenge, we propose a diffusion process that attends to time-dependent spatial frequencies of features with a novel attention mechanism as well as novel noise initialization technique and cross-attention loss. This Fourier-based attention block focuses on features from non-overlapping regions of the generated scene in order to better align the global appearance. Our noise initialization technique incorporates shared noise and low spatial frequency information derived from pixel coordinates and depth maps to induce noise correlations across views. The cross-attention loss further aligns features sharing the same prompt across the scene. Our technique improves SOTA on several quantitative metrics with qualitatively better results when compared to other state-of-the-art approaches for multi-view consistency.

artificial intelligence, consistency, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.03756

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Collaborating Authors

mvdiffusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Shitao T ang

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

a0da690a47b2f52faa63f6fe054057b5-Paper-Conference.pdf

StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

Multi-view Image Diffusion via Coordinate Noise and Fourier Attention