A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

Open in new window