Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark

Open in new window