West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Open in new window