Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

Open in new window