Evaluating statistical language models as pragmatic reasoners