Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

Open in new window