Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Open in new window