DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO

Open in new window