Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Zhao, Juntu, Deng, Junyu, Ye, Yixin, Li, Chongxuan, Deng, Zhijie, Wang, Dequan

Aug-5-2024–arXiv.org Artificial Intelligence

Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. Our code and dataset have been available online for reference.

concept pair, latent concept misalignment, tea cup, (11 more...)

arXiv.org Artificial Intelligence

Aug-5-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)
  - Shandong Province (0.04)

Genre:
- Research Report (1.00)

Industry:
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found