MazeEval: A Benchmark for Testing Sequential Decision-Making in Language Models

Open in new window