Calibrating Multimodal Learning