Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation

Open in new window