Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty