Subject-driven Text-to-Image Generation via Apprenticeship Learning

Jan-18-2025, 19:01:27 GMT–Neural Information Processing Systems

Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an expert model'' for a given subject from a few examples.However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with {in-context} learning.Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization.SuTI is powered by {apprenticeship learning}, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train a massive number of expert models, each specializing in a different subject. The apprentice model SuTI then learns to imitate the behavior of these fine-tuned experts. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods.

apprenticeship learning, expert model, subject-driven text-to-image generation, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 19:01:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning
    - Reinforcement Learning (0.64)
    - Neural Networks (0.64)