Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Open in new window