On the Capabilities and Limitations of Reasoning for Natural Language Understanding