Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples