Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation

Zhou, Zhongliang, Zhang, Jielu, Guan, Zihan, Hu, Mengxuan, Lao, Ni, Mu, Lan, Li, Sheng, Mai, Gengchen

Mar-28-2024–arXiv.org Artificial Intelligence

The field of visual recognition has witnessed a marked improvement, with state-of-the-art models significantly advancing in areas such as object classification [1, 2, 3, 4], object detection [5, 6, 7], semantic segmentation [8, 9, 10, 11], scene parsing [12, 13], disaster response [14, 15], environmental monitoring[16] among others [17, 18, 19]. As progress moves forward, the information retrieval community is widening its focus to include the prediction of more detailed and intricate attributes of information. A key attribute in this expanded scope is image geolocalization [20, 21, 22], which aims to determine the exact geographic coordinates given an image. The ability to accurately geolocalize images is crucial, as it provides possibilities for deducing a wide array of related attributes, such as temperature, elevation, crime rate, population density, and income level, providing a comprehensive insight into the context surrounding the image. In our study, we delve into predicting the geographic coordinates of a photograph solely from the ground-view image. Predictions are considered accurate if they closely match the actual location (Figure 1). Prevailing research approaches fall under the categories of either retrieval-based or classification-based methods. Retrieval-based techniques compare query images against a geo-tagged image database [23, 24, 25, 26, 27, 28, 29, 30], using the location of the image that closest matches the query image to infer its location.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-28-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.14)

Genre:
- Research Report > Promising Solution (0.66)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found