On the Domain Robustness of Contrastive Vision-Language Models