Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment

Open in new window