Evaluating Compositional Scene Understanding in Multimodal Generative Models

Open in new window