On Robustness in Multimodal Learning