Contrastive vision-language learning with paraphrasing and negation