Towards Deconfounded Image-Text Matching with Causal Inference