S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions