Weighted-Reward Preference Optimization for Implicit Model Fusion