AITopics | t2i-compbench

f8ad010cdd9143dbb0e9308c093aff24-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 08:41:09 GMT

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Neural Information Processing SystemsFeb-18-2026, 01:20:10 GMT

We introduce a new approach, Generative mOdel finetun-ing with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models.

artificial intelligence, machine learning, text prompt, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Neural Information Processing SystemsDec-27-2025, 06:18:20 GMT

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach.

comprehensive benchmark, open-world compositional text-to-image generation, t2i-compbench, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Improving Chain-of-Thought Efficiency for Autoregressive Image Generation

Gu, Zeqi, Georgopoulos, Markos, Dai, Xiaoliang, Ghazvininejad, Marjan, Wang, Chu, Juefei-Xu, Felix, Li, Kunpeng, Shi, Yujun, He, Zecheng, He, Zijian, Zhou, Jiawei, Davis, Abe, Wang, Jialiang

arXiv.org Artificial IntelligenceOct-8-2025

Autoregressive multimodal large language models have recently gained popularity for image generation, driven by advances in foundation models. To enhance alignment and detail, newer approaches employ chain-of-thought (CoT) reasoning, expanding user inputs into elaborated prompts prior to image synthesis. However, this strategy can introduce unnecessary redundancy -- a phenomenon we call visual overthinking -- which increases computational costs and can introduce details that contradict the original prompt. In this work, we explore how to generate more concise CoT sequences for more efficient image generation. We introduce ShortCoTI, a lightweight optimization framework that encourages more concise CoT while preserving output image quality. ShortCoTI rewards more concise prompts with an adaptive function that scales according to an estimated difficulty for each task. Incorporating this reward into a reinforcement learning paradigm reduces prompt reasoning length by 54% while maintaining or slightly improving quality metrics across multiple benchmarks (T2I-CompBench, GenEval). Qualitative analysis shows that our method eliminates verbose explanations and repetitive refinements, producing reasoning prompts that are both concise and semantically rich. As a result, ShortCoTI improves computational efficiency without compromising the fidelity or visual appeal of generated images.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.05593

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation

Seo, Hoigi, Bang, Junseo, Lee, Haechang, Lee, Joohoon, Lee, Byung Hyun, Chun, Se Young

arXiv.org Artificial IntelligenceMar-29-2025

Text-to-Image (T2I) models often suffer from text-image misalignment in complex scenes involving multiple objects and attributes. Semantic binding aims to mitigate this issue by accurately associating the generated attributes and objects with their corresponding noun phrases (NPs). Existing methods rely on text or latent optimizations, yet the factors influencing semantic binding remain underexplored. Here we investigate the geometrical properties of text token embeddings and their cross-attention (CA) maps. We empirically and theoretically analyze that the geometrical properties of token embeddings, specifically both angular distances and norms, play a crucial role in CA map differentiation. Then, we propose \textbf{TeeMo}, a training-free text embedding-aware T2I framework with strong semantic binding. TeeMo consists of Causality-Aware Projection-Out (CAPO) for distinct inter-NP CA maps and Adaptive Token Mixing (ATM) with our loss to enhance inter-NP separation while maintaining intra-NP cohesion in CA maps. Extensive experiments confirm TeeMo consistently outperforms prior arts across diverse baselines and datasets.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.23011

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Neural Information Processing SystemsJan-20-2025, 02:58:01 GMT

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach.

comprehensive benchmark, open-world compositional text-to-image generation, t2i-compbench, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Zhang, Fan, Tian, Shulin, Huang, Ziqi, Qiao, Yu, Liu, Ziwei

arXiv.org Artificial IntelligenceDec-15-2024

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making the process computationally expensive, especially for diffusion-based models with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations. In contrast, humans can quickly form impressions of a model's capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools. Experiments show that Evaluation Agent reduces evaluation time to 10% of traditional methods while delivering comparable results. The Evaluation Agent framework is fully open-sourced to advance research in visual generative models and their efficient evaluation.

evaluation, evaluation agent, query, (12 more...)

arXiv.org Artificial Intelligence

2412.09645

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

Filters

Collaborating Authors

t2i-compbench

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

f8ad010cdd9143dbb0e9308c093aff24-Paper-Datasets_and_Benchmarks.pdf

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Improving Chain-of-Thought Efficiency for Autoregressive Image Generation

On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models