A Theorem-Proving-Based Evaluation of Neural Semantic Parsing