Confidence as a Reward: Transforming LLMs into Reward Models

Open in new window