Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective