Autodecoding Latent 3D Diffusion Models
–Neural Information Processing Systems
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all - instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-theart alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.
Neural Information Processing Systems
May-25-2025, 12:52:37 GMT
- Country:
- Genre:
- Research Report
- New Finding (0.68)
- Promising Solution (0.48)
- Research Report
- Industry:
- Information Technology (0.67)
- Technology: