Holistic Evaluation of Text-to-Image Models Tony Lee 1 Yifan Mai
–Neural Information Processing Systems
The stunning qualitative improvement of text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on image-text alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/latest
Neural Information Processing Systems
May-26-2025, 03:27:14 GMT
- Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.35)
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.66)
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology (1.00)
- Law > Intellectual Property & Technology Law (0.67)
- Technology: