Testing the General Deductive Reasoning Capacity of Large Language Models Using OODExamples

Open in new window