Robust Preference Optimization through Reward Model Distillation

Open in new window