CityLens: Benchmarking Large Language-Vision Models for Urban Socioeconomic Sensing

Liu, Tianhui, Feng, Jie, Pang, Hetian, Zhang, Xin, Ouyang, Tianjian, Zhang, Zhiyuan, Li, Yong

Jun-3-2025–arXiv.org Artificial Intelligence

Understanding urban socioeconomic conditions through visual data is a challenging yet essential task for sustainable urban development and policy planning. In this work, we introduce $\textbf{CityLens}$, a comprehensive benchmark designed to evaluate the capabilities of large language-vision models (LLVMs) in predicting socioeconomic indicators from satellite and street view imagery. We construct a multi-modal dataset covering a total of 17 globally distributed cities, spanning 6 key domains: economy, education, crime, transport, health, and environment, reflecting the multifaceted nature of urban life. Based on this dataset, we define 11 prediction tasks and utilize three evaluation paradigms: Direct Metric Prediction, Normalized Metric Estimation, and Feature-Based Regression. We benchmark 17 state-of-the-art LLVMs across these tasks. Our results reveal that while LLVMs demonstrate promising perceptual and reasoning capabilities, they still exhibit limitations in predicting urban socioeconomic indicators. CityLens provides a unified framework for diagnosing these limitations and guiding future efforts in using LLVMs to understand and predict urban socioeconomic patterns. Our codes and datasets are open-sourced via https://github.com/tsinghua-fib-lab/CityLens.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Asia (0.94)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.47)
- Banking & Finance > Real Estate (0.46)
- Health & Medicine
  - Consumer Health (0.68)
  - Public Health (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Cognitive Science > Problem Solving (0.66)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.70)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found