Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

Open in new window