Fine-Grained Semantically Aligned Vision-Language Pre-Training

Neural Information Processing Systems 

Experiments show that LOUPE achieves state-of-the-art performance on a variety of vision-language tasks.