GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration