An Object-Focused Framework for Evaluating Text-to-Image Alignment

Neural Information Processing Systems 

Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models.