SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment

Open in new window