asmrtrraedaliindtkneadiefgnlbisl nieget qheud auaedn c iey wndhl, i e. ahpccrrisoi tert.elcholereFigureoptimized

Jun-18-2026, 20:59:06 GMT–Neural Information Processing Systems

Recent advances in diffusion models have dramatically improved image fidelity and diversity. However, aligning these models with nuanced human preferences -such as aesthetics, engagement, and subjective appeal remains a key challenge due to the scarcity of large-scale human annotations. Collecting such data is both expensive and limited in diversity. To address this, we leverage the reasoning capabilities of vision-language models (VLMs) and propose Self-Play Reward Optimization (SPRO), a scalable, annotation-free training framework based on multimodal self-play. SPRO learns to jointly align prompt and image generation with human preferences by iteratively generating, evaluating, and learning to refine outputs using synthetic reward signals such as aesthetics and human engagement.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-18-2026, 20:59:06 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Media (1.00)
- Leisure & Entertainment > Games (1.00)
- Information Technology > Security & Privacy (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.67)

Technology:
- Information Technology
  - Communications > Social Media (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Cognitive Science > Problem Solving (0.87)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found