Pragmatic Inference with a CLIP Listener for Contrastive Captioning