Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization
Meng, Debin, Jin, Chen, Gao, Zheng, Li, Yanran, Patras, Ioannis, Tzimiropoulos, Georgios
–arXiv.org Artificial Intelligence
Image diversity remains a fundamental challenge for text-to-image diffusion models. Low-diversity models tend to generate repetitive outputs, increasing sampling redundancy and hindering both creative exploration and downstream applications. A primary cause is that generation often collapses toward a strong mode in the learned distribution. Existing attempts to improve diversity, such as noise resampling, prompt rewriting, or steering-based guidance--often still collapse to dominant modes or introduce distortions that degrade image quality. In light of this, we propose T oken-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module. TPSO introduces learnable parameters to explore under-represented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution. At the same time, the prompt-level space provides a global semantic constraint that regulates distribution shifts, preventing quality degradation while maintaining high fidelity. Extensive experiments on MS-COCO and three diffusion backbones show that TPSO significantly enhances generative diversity--improving baseline performance from 1.10 to 4.18 points--without sacrificing image quality. Code will be released upon acceptance.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Asia > Middle East
- Iran > Tehran Province > Tehran (0.04)
- Europe > Switzerland
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Technology: