Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation