SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Open in new window