Reward Modeling with Ordinal Feedback: Wisdom of the Crowd

Open in new window