Compositional Text-to-Image Generation with Dense Blob Representations

Nie, Weili, Liu, Sifei, Mardani, Morteza, Liu, Chao, Eckart, Benjamin, Vahdat, Arash

May-13-2024–arXiv.org Artificial Intelligence

Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we develop a blob-grounded text-to-image diffusion model, termed BlobGEN, for compositional generation. Particularly, we introduce a new masked cross-attention module to disentangle the fusion between blob representations and visual features. To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. Our extensive experiments show that BlobGEN achieves superior zero-shot generation quality and better layout-guided controllability on MS-COCO. When augmented by LLMs, our method exhibits superior numerical and spatial correctness on compositional image generation benchmarks.

blob representation, compositional text-to-image generation, representation, (11 more...)

arXiv.org Artificial Intelligence

May-13-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Austria > Vienna (0.14)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Sports
  - Cycling (0.67)
- Media (0.93)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning > Generative AI (0.46)
    - Natural Language > Large Language Model (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found