Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

Jun-14-2026, 08:16:59 GMT–Neural Information Processing Systems

Current generative depth estimation models fine-tune Stable Diffusion and achieve impressive performance. However, they require a VAE to compress depth maps into the latent space, which inevitably introduces flying pixels at edges and details.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Jun-14-2026, 08:16:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.49)
  - Machine Learning (0.40)