Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

Open in new window