Improving Multimodal Contrastive Learning of Sentence Embeddings with Object-Phrase Alignment