WPO: Enhancing RLHF with Weighted Preference Optimization