The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation