Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning