On the Comparison between Multi-modal and Single-modal Contrastive Learning