A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models