CounterfactualContrastiveLearningfor Weakly-SupervisedVision-LanguageGrounding