Beyond Scalar Reward Model: Learning Generative Judge from Preference Data

Open in new window