Improving Open-Domain Dialogue Evaluation with a Causal Inference Model