Classifier Calibration at Scale: An Empirical Study of Model-Agnostic Post-Hoc Methods