Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

Zhou, Mingyuan, Wang, Zhendong, Zheng, Huangjie, Huang, Hai

Jun-22-2024–arXiv.org Machine Learning

Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Machine Learning

Jun-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.28)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine (0.46)
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Vision (1.00)