CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models