AGradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models
–Neural Information Processing Systems
Direct Preference Optimization (DPO) is a key framework for aligning text-to-image models with human preferences, extended by Stepwise Preference Optimization (SPO) to leverage intermediate steps for preference learning, generating more aesthetically pleasing images with significantly less computational cost.
Neural Information Processing Systems
Jun-19-2026, 12:38:48 GMT