ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation

Zhou, Yuan, Jin, Shilong, Hua, Litao, Lv, Wanjun, Duan, Haoran, Han, Jungong

Dec-11-2025–arXiv.org Artificial Intelligence

Abstract--Recent advances in zero-shot text-to-3D generation have revolutionised 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent prior view biases in T2I Models. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. T o address this fundamental challenge, we propose ConsDreamer, a novel method that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise view control; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer can be seamlessly integrated into various 3D representations and score distillation paradigms, effectively mitigating the multi-face Janus problem. GENERA TION technology plays a crucial role in various fields such as innovative industrial design, game development, and virtual reality. In particular, zero-shot text-to-3D generation [1], [2], [3], [4], [5] aims to generate 3D content without 3D training data, enabling the conversion from concept to reality. However, zero-shot text-to-3D generation tasks [6], [7], [8], [9] are constrained by the inherent complexity of the wild world and the scarcity of 3D data, unlike text-to-image (T2I) tasks [10], [11]. From this perspective, generating high-quality 3D content from text is still a significant challenge.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Dec-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report > Promising Solution (0.54)

Industry:
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found