GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration

Open in new window