S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions Sangwoo Mo1,2 Minkyu Kim 1,3 Kyungmin Lee 1 Jinwoo Shin

Neural Information Processing Systems 

Several studies have attempted to reduce the number of image-text pairs used for vision-language pre-training.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found