Personalized and Sequential Text-to-Image Generation

Nabati, Ofir, Tennenholtz, Guy, Hsu, ChihWei, Ryu, Moonkyung, Ramachandran, Deepak, Chow, Yinlam, Li, Xiang, Boutilier, Craig

Dec-9-2024–arXiv.org Artificial Intelligence

We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) extends T2I models with personalized multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and simulated user-rater interactions to support future research in personalized, multi-turn T2I generation.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

Dec-9-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Macao (0.04)
- North America
  - United States > Louisiana
    - Orleans Parish > New Orleans (0.04)
  - Canada > Alberta
    - Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Transportation > Ground > Rail (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Natural Language > Large Language Model (0.93)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.93)
    - Reinforcement Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found