Towards Deconfounded Image-Text Matching with Causal Inference

Open in new window