WPO: Enhancing RLHF with Weighted Preference Optimization

Open in new window