Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data