SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment