Robust Multimodal Learning via Representation Decoupling