$\beta$-calibration of Language Model Confidence Scores for Generative QA

Open in new window