NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization

Zhang, Zheyuan, Li, Runze, Kabir, Tasnim, Boyd-Graber, Jordan

Feb-20-2025–arXiv.org Artificial Intelligence

Image geo-localization is the task of predicting the specific location of an image and requires complex reasoning across visual, geographical, and cultural contexts. While prior Vision Language Models (VLMs) have the best accuracy at this task, there is a dearth of high-quality datasets and models for analytical reasoning. We first create NaviClues, a high-quality dataset derived from GeoGuessr, a popular geography game, to supply examples of expert reasoning from language. Using this dataset, we present Navig, a comprehensive image geo-localization framework integrating global and fine-grained image information. By reasoning with language, Navig reduces the average distance error by 14% compared to previous state-of-the-art models while requiring fewer than 1000 training samples. Our dataset and code are available at https://github.com/SparrowZheyuan18/Navig/.

accuracy, information, reasoning, (12 more...)

arXiv.org Artificial Intelligence

Feb-20-2025

arXiv.org PDF

Add feedback

Country:
- South America
  - Chile (0.04)
  - Argentina (0.04)
- North America
  - United States
    - Maryland (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
    - Alaska > Anchorage Municipality
      - Anchorage (0.04)
  - Canada
    - British Columbia > Vancouver (0.04)
    - Ontario (0.04)
- Europe
  - Italy (0.04)
  - United Kingdom > England (0.04)
  - France (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
- Asia
  - Japan (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Israel > Southern District
      - Ashkelon (0.04)
  - China
    - Jiangsu Province > Nanjing (0.04)
    - Hong Kong (0.04)
- Africa > Middle East
  - Tunisia (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found