A Systematic Evaluation of Large Language Models on Out-of-Distribution Logical Reasoning Tasks