asmrtrraedaliindtkneadiefgnlbisl nieget qheud auaedn c iey wndhl, i e. ahpccrrisoi tert.elcholereFigureoptimized
–Neural Information Processing Systems
Recent advances in diffusion models have dramatically improved image fidelity and diversity. However, aligning these models with nuanced human preferences -such as aesthetics, engagement, and subjective appeal remains a key challenge due to the scarcity of large-scale human annotations. Collecting such data is both expensive and limited in diversity. To address this, we leverage the reasoning capabilities of vision-language models (VLMs) and propose Self-Play Reward Optimization (SPRO), a scalable, annotation-free training framework based on multimodal self-play. SPRO learns to jointly align prompt and image generation with human preferences by iteratively generating, evaluating, and learning to refine outputs using synthetic reward signals such as aesthetics and human engagement.
Neural Information Processing Systems
Jun-18-2026, 20:59:06 GMT
- Country:
- North America > United States (0.67)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Media (1.00)
- Leisure & Entertainment > Games (1.00)
- Information Technology > Security & Privacy (0.67)
- Government > Regional Government
- Technology:
- Information Technology
- Communications > Social Media (0.93)
- Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language > Large Language Model (1.00)
- Cognitive Science > Problem Solving (0.87)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology