RM-R1: Reward Modeling as Reasoning

Open in new window