Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

May-27-2025, 19:13:13 GMT–Neural Information Processing Systems

Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative model scalable to in-the-wild input images, without requiring a multi-view diffusion model or SDS optimization. Our approach comprises two primary components: a Direct 3D Variational Auto-Encoder (D3D-VAE) and a Direct 3D Diffusion Transformer (D3D-DiT). D3D-VAE efficiently encodes high-resolution 3D shapes into a compact and continuous latent triplane space. Notably, our method directly supervises the decoded geometry using a semi-continuous surface sampling strategy, diverging from previous methods relying on rendered images as supervision signals.

diffusion transformer, direct3d, scalable image-to-3d generation, (2 more...)

Neural Information Processing Systems

May-27-2025, 19:13:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)