Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios