InstructBooth: Instruction-following Personalized Text-to-Image Generation

Chae, Daewon, Park, Nokyung, Kim, Jinkyu, Lee, Kimin

Dec-4-2023–arXiv.org Artificial Intelligence

Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often encounter challenges in aligning with text prompts due to overfitting to the limited training images. In this work, we introduce InstructBooth, a novel method designed to enhance image-text alignment in personalized text-to-image models. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment. Additionally, we propose complementary techniques to increase the synergy between these two processes. Our method demonstrates superior image-text alignment compared to baselines while maintaining personalization ability. In human evaluations, InstructBooth outperforms DreamBooth when considering all comprehensive factors.

artificial intelligence, instructbooth, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-4-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.34)

Industry:
- Leisure & Entertainment > Sports (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.86)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)