Intra-Trajectory Consistency for Reward Modeling

Open in new window