Robust Preference Optimization through Reward Model Distillation