Subject-driven Text-to-Image Generation via Apprenticeship Learning

Chen, Wenhu, Hu, Hexiang, Li, Yandong, Ruiz, Nataniel, Jia, Xuhui, Chang, Ming-Wei, Cohen, William W.

Oct-2-2023–arXiv.org Artificial Intelligence

Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train a massive number of expert models, each specializing in a different subject. The apprentice model SuTI then learns to imitate the behavior of these fine-tuned experts. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods. On the challenging DreamBench and DreamBench-v2, our human evaluation shows that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth, especially on the subject and text alignment aspects.

dreambooth, expert model, suti, (14 more...)

arXiv.org Artificial Intelligence

Oct-2-2023

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East
    - Saudi Arabia > Northern Borders Province
      - Arar (0.04)
    - Israel > Tel Aviv District
      - Tel Aviv (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report (0.40)

Industry:
- Transportation (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found