WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

Open in new window