Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Open in new window