An Empirical Study Into What Matters for Calibrating Vision-Language Models