USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning of LLMs as Urban Agents

Open in new window