Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss