Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples Abulhair Saparov Richard Y uanzhe Pang