9b9cfd5428153ccfbd4ba34b7e007305-Paper-Conference.pdf

Neural Information Processing Systems 

With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness --the semantic coherence of generated images to the prompts they were conditioned on. A variety of T2I faithfulness metrics have been proposed, leveraging advances in cross-modal embeddings and vision-language models (VLMs).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found