On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Open in new window