Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment