AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Open in new window