UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Region Profiling
Hao, Xixuan, Chen, Wei, Yan, Yibo, Zhong, Siru, Wang, Kun, Wen, Qingsong, Liang, Yuxuan
–arXiv.org Artificial Intelligence
Urban region profiling aims to learn a low-dimensional representation of a given urban area while preserving its characteristics, such as demographics, infrastructure, and economic activities, for urban planning and development. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place.Secondly, the lack of interpretability in pretrained models limits their utility in providing transparent evidence for urban planning. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro (street-view) levels, overcoming the limitations of prior pretrained models. Moreover, it introduces automatic text generation and calibration, elevating interpretability in downstream applications by producing high-quality text descriptions of urban imagery. Rigorous experiments conducted across six urban indicator prediction tasks underscore its superior performance.
arXiv.org Artificial Intelligence
May-29-2024
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- North America
- Dominican Republic (0.04)
- United States > New York
- New York County > New York City (0.04)
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Data Science (1.00)
- Information Management (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology