Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
–Neural Information Processing Systems
Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels.
Neural Information Processing Systems
Jun-13-2026, 10:42:18 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.70)
- Vision (0.60)
- Information Technology > Artificial Intelligence