CounterfactualContrastiveLearningfor Weakly-SupervisedVision-LanguageGrounding

Open in new window