PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Open in new window