Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward

Open in new window