How often are errors in natural language reasoning due to paraphrastic variability?