Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples Abulhair Saparov Richard Y uanzhe Pang

Open in new window