Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Jun-13-2026, 10:42:18 GMT–Neural Information Processing Systems

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Jun-13-2026, 10:42:18 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.70)
  - Vision (0.60)