A.1 Qualitative Results of Bench

Jun-22-2026, 21:24:40 GMT–Neural Information Processing Systems

Figure 5: Word clouds of text prompts for the text-only generation (T2I) task (left) and the multimodal generation task (right). Figure 5 visually summarizes the prominent semantic elements in the benchmark prompts for text-only492 (T2I) and multimodal generation tasks. The differentiation of the word clouds reflects task-specific493 features of MMGen-Bench, emphasizing spatial and descriptive details in T2I tasks, while multimodal494 tasks more frequently involve social and interactive scenarios.495 Aspect Objects Relations Attributes Counting Overall Spearman ω 0.469 0.909 0.601 0.839 0.699 As depicted in Figure 6, the distribution of aspect types differs notably between the text-only497 generation (T2I) and multi-modal generation tasks. In the T2I setting, "Objects" dominate with498 38.3%, while "Attributes" and "Relations" also constitute substantial proportions (33.9% and 25.4%,499 respectively).

artificial intelligence, interaction, machine learning, (17 more...)

Neural Information Processing Systems

Jun-22-2026, 21:24:40 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found