Autodecoding Latent 3D Diffusion Models

May-25-2025, 12:52:37 GMT–Neural Information Processing Systems

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all - instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-theart alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

May-25-2025, 12:52:37 GMT

Conferences PDF

Add feedback

Country:
- Asia > Japan
  - Honshū > Chūbu (0.14)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:
- Research Report
  - New Finding (0.68)
  - Promising Solution (0.48)

Industry:
- Information Technology (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Natural Language (0.93)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)