Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment