SPRO: Improving Image Generation via Self-Play

Jun-12-2026, 22:47:45 GMT–Neural Information Processing Systems

Recent advances in diffusion models have dramatically improved image fidelity and diversity. However, aligning these models with nuanced human preferences -such as aesthetics, engagement, and subjective appeal remains a key challenge due to the scarcity of large-scale human annotations. Collecting such data is both expensive and limited in diversity. To address this, we leverage the reasoning capabilities of vision-language models (VLMs) and propose Self-Play Reward Optimization (SPRO), a scalable, annotation-free training framework based on multimodal self-play. SPRO learns to jointly align prompt and image generation with human preferences by iteratively generating, evaluating, and learning to refine outputs using synthetic reward signals such as aesthetics and human engagement.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Jun-12-2026, 22:47:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)