GLEAM: Learning to Match and Explain in Cross-View Geo-Localization
Lu, Xudong, Zheng, Zhi, Wan, Yi, Yao, Yongxiang, Wang, Annan, Zhang, Renrui, Xia, Panwang, Wu, Qiong, Li, Qingyun, Lin, Weifeng, Zhao, Xiangyu, Ma, Peifeng, Yang, Xue, Li, Hongsheng
–arXiv.org Artificial Intelligence
Cross-View Geo-Localization (CVGL) focuses on identifying correspondences between images captured from distinct perspectives of the same geographical location. However, existing CVGL approaches are typically restricted to a single view or modality, and their direct visual matching strategy lacks interpretability: they only determine whether two images correspond, without explaining the rationale behind the match. In this paper, we present GLEAM-C, a foundational CVGL model that unifies multiple views and modalities-including UAV imagery, street maps, panoramic views, and ground photographs-by aligning them exclusively with satellite imagery. Our framework enhances training efficiency through optimized implementation while achieving accuracy comparable to prior modality-specific CVGL models through a two-phase training strategy. Moreover, to address the lack of interpretability in traditional CVGL methods, we leverage the reasoning capabilities of multimodal large language models (MLLMs) to propose a new task, GLEAM-X, which combines cross-view correspondence prediction with explainable reasoning. To support this task, we construct a bilingual benchmark using GPT-4o and Doubao-1.5-Thinking-Vision-Pro to generate training and testing data. The test set is further refined through detailed human revision, enabling systematic evaluation of explainable cross-view reasoning and advancing transparency and scalability in geo-localization. Together, GLEAM-C and GLEAM-X form a comprehensive CVGL pipeline that integrates multi-modal, multi-view alignment with interpretable correspondence analysis, unifying accurate cross-view matching with explainable reasoning and advancing Geo-Localization by enabling models to better Explain And Match. Code and datasets used in this work will be made publicly accessible at https://github.com/Lucky-Lance/GLEAM.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Asia > China
- Heilongjiang Province > Harbin (0.04)
- Hong Kong (0.04)
- Hubei Province > Wuhan (0.04)
- Shanghai > Shanghai (0.04)
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States (0.14)
- Asia > China
- Genre:
- Research Report (1.00)
- Industry:
- Technology: