Fine-Grained Semantically Aligned Vision-Language Pre-Training

Open in new window