ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs

Open in new window