gradspo
AGradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models
Direct Preference Optimization (DPO) is a key framework for aligning text-to-image models with human preferences, extended by Stepwise Preference Optimization (SPO) to leverage intermediate steps for preference learning, generating more aesthetically pleasing images with significantly less computational cost.