MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Ding, Yanbo, Zhuang, Shaobin, Li, Kunchang, Yue, Zhengrong, Qiao, Yu, Wang, Yali
–arXiv.org Artificial Intelligence
Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES addresses this challenging task by developing a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, we find that existing benchmarks lack detailed descriptions of complex 3D spatial relationships of multiple objects. To fill this gap, we further construct a new benchmark of T2I-3DisBench (3D image scene), which describes diverse 3D image scenes with 50 detailed prompts. Extensive experiments show the state-of-the-art performance of MUSES on both T2I-CompBench and T2I-3DisBench, outperforming recent strong competitors such as DALL-E 3 and Stable Diffusion 3. These results demonstrate a significant step of MUSES forward in bridging natural language, 2D image generation, and 3D world.
arXiv.org Artificial Intelligence
Aug-21-2024
- Country:
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Asia
- Middle East > Republic of Türkiye
- Batman Province > Batman (0.05)
- China
- Shanghai > Shanghai (0.04)
- Guangdong Province > Shenzhen (0.04)
- Middle East > Republic of Türkiye
- Europe > Netherlands
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Transportation > Ground (0.46)
- Information Technology (0.46)
- Technology: