3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

Chen, Hansheng, Shen, Bokui, Liu, Yulin, Shi, Ruoxi, Zhou, Linqi, Lin, Connor Z., Gu, Jiayuan, Su, Hao, Wetzstein, Gordon, Guibas, Leonidas

Oct-24-2024–arXiv.org Artificial Intelligence

Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to our approach is the idea of 3D feedback augmentation: for each denoising step in the sampling loop, 3D-Adapter decodes intermediate multi-view features into a coherent 3D representation, then re-encodes the rendered RGBD views to augment the pretrained base model through feature addition. We study two variants of 3D-Adapter: a fast feed-forward version based on Gaussian splatting and a versatile training-free version utilizing neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter not only greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++, but also enables high-quality 3D generation using the plain text-to-image Stable Diffusion. Furthermore, we showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-24-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States (0.28)

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.88)
  - Vision (1.00)