T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Feb-11-2025, 17:03:12 GMT–Neural Information Processing Systems

Despite the stunning ability to generate high-quality images by recent text-toimage models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach.

artificial intelligence, machine learning, text prompt, (15 more...)

Neural Information Processing Systems

Feb-11-2025, 17:03:12 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.28)
- Europe > Switzerland (0.28)

Industry:
- Materials > Containers & Packaging (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.49)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)