AITopics | geolocation

Collaborating Authors

geolocation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LocDiff: Identifying Locations on Earth by Diffusing in the Hilbert Space

Neural Information Processing SystemsJun-23-2026, 14:23:32 GMT

Image geolocalization is a fundamental yet challenging task, aiming at inferring the geolocation on Earth where an image is taken. State-of-the-art methods employ either grid-based classification or gallery-based image-location retrieval, whose spatial generalizability significantly suffers if the spatial distribution of test images does not align with the choices of grids and galleries. Recently emerging generative approaches, while getting rid of grids and galleries, use raw geographical coordinates and suffer quality losses due to their lack of multi-scale information. To address these limitations, we propose a multi-scale latent diffusion model called LocDiff for image geolocalization. We developed a novel positional encoding-decoding framework called Spherical Harmonics Dirac Delta (SHDD) Representations, which encodes points on a spherical surface (e.g., geolocations on Earth) into a Hilbert space of Spherical Harmonics coefficients and decodes points (geolocations) by mode-seeking on spherical probability distributions. We also propose a novel SirenNet-based architecture (CS-UNet) to learn an image-based conditional backward process in the latent SHDD space by minimizing a latent KL-divergence loss. To the best of our knowledge, LocDiff is the first image geolocalization model that performs latent diffusion in a multi-scale location encoding space and generates geolocations under the guidance of images. Experimental results show that LocDiff can outperform all state-of-the-art grid-based, retrieval-based, and diffusion-based baselines across 5 challenging global-scale image geolocalization datasets, and demonstrates significantly stronger generalizability to unseen geolocations.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

Zhang, Xinyu, Wu, Yixin, Zhang, Boyang, Lin, Chenhao, Shen, Chao, Backes, Michael, Zhang, Yang

arXiv.org Artificial IntelligenceDec-1-2025

Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (L VLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image ge-olocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (L VLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

accuracy, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.22441

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TerraMind: Large-Scale Generative Multimodality for Earth Observation

Jakubik, Johannes, Yang, Felix, Blumenstiel, Benedikt, Scheurer, Erik, Sedona, Rocco, Maurogiovanni, Stefano, Bosmans, Jente, Dionelis, Nikolaos, Marsocci, Valerio, Kopp, Niklas, Ramachandran, Rahul, Fraccaro, Paolo, Brunschwiler, Thomas, Cavallaro, Gabriele, Bernabe-Moreno, Juan, Longépé, Nicolas

arXiv.org Artificial IntelligenceSep-11-2025

We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.11171

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Validating Terrain Models in Digital Twins for Trustworthy sUAS Operations

Bernal, Arturo Miguel Russell, Petterson, Maureen, Granadeno, Pedro Antonio Alarcon, Murphy, Michael, Mason, James, Cleland-Huang, Jane

arXiv.org Artificial IntelligenceAug-25-2025

--With the increasing deployment of small Unmanned Aircraft Systems (sUAS) in unfamiliar and complex environments, Environmental Digital Twins (EDT) that comprise weather, airspace, and terrain data are critical for safe flight planning and for maintaining appropriate altitudes during search and surveillance operations. With the expansion of sUAS capabilities through edge and cloud computing, accurate EDT are also vital for advanced sUAS capabilities, like geolocation. However, real-world sUAS deployment introduces significant sources of uncertainty, necessitating a robust validation process for EDT components. This paper focuses on the validation of terrain models, one of the key components of an EDT, for real-world sUAS tasks. These models are constructed by fusing U.S. Geological Survey (USGS) datasets and satellite imagery, incorporating high-resolution environmental data to support mission tasks. V alidating both the terrain models and their operational use by sUAS under real-world conditions presents significant challenges, including limited data granularity, terrain discontinuities, GPS and sensor inaccuracies, visual detection uncertainties, as well as onboard resources and timing constraints. We propose a 3-Dimensions validation process grounded in software engineering principles, following a workflow across granularity of tests, simulation to real world, and the analysis of simple to edge conditions. We demonstrate our approach using a multi-sUAS platform equipped with a T errain-A ware Digital Shadow. As swarms of small Unmanned Aircraft Systems (sUAS) are increasingly deployed in complex, unstructured environments such as disaster zones, wilderness areas, and wildfire regions, the need for accurate environmental models becomes critical. Effective sUAS mission planning requires awareness not only of dynamic airspace and weather conditions but also of the underlying terrain. In such settings, terrain is often the dominant factor influencing flight safety, sensor placement, line-of-sight communications, and search effectiveness. This paper focuses specifically on the role of terrain models that enable mission-level decision-making and flight planning for sUAS operations. However, terrain inaccuracies or blind spots, such as missing elevation data, undetected peaks, or mismatched georeferencing, can result in ineffective or even hazardous behavior by autonomous vehicles. To minimize these issues, we construct and maintain a terrain model by fusing multiple sources of environmental data, including public USGS datasets [1], [2], and satellite imagery [3].

artificial intelligence, machine learning, terrain model, (18 more...)

arXiv.org Artificial Intelligence

2508.16104

Country: North America > United States > Indiana (0.14)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Aerospace & Defense > Aircraft (0.86)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

c28e5b0c9841b5ef396f9f519bf6c217-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 06:14:37 GMT

dataset, football, video, (7 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana (0.05)
South America (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(4 more...)

Industry:

Health & Medicine > Consumer Health (1.00)
Leisure & Entertainment > Sports > Track & Field (0.94)
Consumer Products & Services (0.94)

Technology: Information Technology > Artificial Intelligence (0.69)

Add feedback

Audio Geolocation: A Natural Sounds Benchmark

Chasmai, Mustafa, Liu, Wuao, Maji, Subhransu, Van Horn, Grant

arXiv.org Artificial IntelligenceJul-23-2025

Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2505.18726

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Communications > Social Media (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models

Yi, Qiang, Shan, Lianlei

arXiv.org Artificial IntelligenceJun-3-2025

Accurately determining the geographic location where a single image was taken, visual geolocation, remains a formidable challenge due to the planet's vastness and the deceptive similarity among distant locations. We introduce GeoLocSFT, a framework that demonstrates how targeted supervised fine-tuning (SFT) of a large multimodal foundation model (Gemma 3) using a small, high-quality dataset can yield highly competitive geolocation performance. GeoLocSFT is trained with only 2700 carefully selected image-GPS pairs from our geographically diverse MR600k dataset. Despite this limited data, our SFT-centric approach substantially improves over baseline models and achieves robust results on standard benchmarks such as Im2GPS-3k and YFCC-4k, as well as on our newly proposed and challenging MR40k benchmark, aimed specifically at sparsely populated regions. Further, we explore multi-candidate inference and aggregation strategies but find that the core gains are already realized at the SFT stage. Our findings highlight the power of high-quality supervision and efficient SFT for planet-scale image geolocation, especially when compared to prior methods that require massive databases or complex pipelines. To foster further research, we publicly release the MR40k benchmark dataset.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.01277

Country:

Europe (0.68)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
(2 more...)

Add feedback

Evaluating Precise Geolocation Inference Capabilities of Vision Language Models

Jay, Neel, Nguyen, Hieu Minh, Hoang, Trung Dung, Haimes, Jacob

arXiv.org Artificial IntelligenceFeb-20-2025

The prevalence of Vision-Language Models (VLMs) raises important questions about privacy in an era where visual information is increasingly available. While foundation VLMs demonstrate broad knowledge and learned capabilities, we specifically investigate their ability to infer geographic location from previously unseen image data. This paper introduces a benchmark dataset collected from Google Street View that represents its global distribution of coverage. Foundation models are evaluated on single-image geolocation inference, with many achieving median distance errors of <300 km. We further evaluate VLM "agents" with access to supplemental tools, observing up to a 30.6% decrease in distance error. Our findings establish that modern foundation VLMs can act as powerful image geolocation tools, without being specifically trained for this task. When coupled with increasing accessibility of these models, our findings have greater implications for online privacy. We discuss these risks, as well as future work in this area.

agent, category, street view, (16 more...)

arXiv.org Artificial Intelligence

2502.14412

Country:

Europe > United Kingdom > England (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.05)
South America > Brazil (0.04)
(11 more...)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

HMCGeo: IP Region Prediction Based on Hierarchical Multi-label Classification

Zhao, Tianzi, Liu, Xinran, Zhang, Zhaoxin, Zhao, Dong, Li, Ning, Zhang, Zhichao, Wang, Xinye

arXiv.org Artificial IntelligenceJan-26-2025

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China Emails: {23b903088, zhangzhaoxin, 22s030153, li.ning, 22b303010}@stu.hit.edu.cn School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China Email: xinran_Liu@bupt.edu.cn Abstract --Fine-grained IP geolocation plays a critical role in applications such as location-based services and cybersecurity. Most existing fine-grained IP geolocation methods are regression-based; however, due to noise in the input data, these methods typically encounter kilometer-level prediction errors and provide incorrect region information for users. T o address this issue, this paper proposes a novel hierarchical multi-label classification framework for IP region prediction, named HMCGeo. This framework treats IP geolocation as a hierarchical multi-label classification problem and employs residual connection-based feature extraction and attention prediction units to predict the target host region across multiple geographical granularities. Furthermore, we introduce probabilistic classification loss during training, combining it with hierarchical cross-entropy loss to form a composite loss function. IP region prediction experiments on the New Y ork, Los Angeles, and Shanghai datasets demonstrate that HMCGeo achieves superior performance across all geographical granularities, significantly outperforming existing IP geolocation methods. P geolocation is a technique used to predict the geographical location of a host based on its IP address [1], playing a crucial role in location-based services, network topology optimization, and cybersecurity [2], [3], [4], [5], [6], [7], [8]. Using IP geolocation technology, online services and applications infer the geographical location of users to deliver localized weather updates, news, and event notifications [3]. Internet service providers (ISPs) estimate the approximate location of target hosts to optimize traffic transmission paths, reduce network latency, and improve transmission efficiency [4]. Network analysts examine the geographical origins of incoming traffic to assess security threats from suspicious addresses. This research was supported by the National Key R&D Program of China (2024QY1103, 2018YFB18002). Based on the accuracy of prediction results, IP geolocation is categorized into coarse-grained and fine-grained geolocation. Coarse-grained IP geolocation predicts the location of a target host by utilizing allocation information such as Autonomous System Numbers (ASN), ISP, and BGP, or by analyzing the relationship between latency and distance. These methods construct geolocation databases that provide location information at the country or city level. Building on this foundation, fine-grained IP geolocation reduces prediction errors to a few kilometers in certain regions by leveraging richer landmarks or employing more effective prediction methods.

artificial intelligence, granularity, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.16392

Country:

Asia > China > Heilongjiang Province > Harbin (0.44)
Asia > China > Beijing > Beijing (0.44)
North America > United States > California > Los Angeles County > Los Angeles (0.25)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Geographic Information Systems (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback

A Novel End-To-End Event Geolocation Method Leveraging Hyperbolic Space and Toponym Hierarchies

Qiao, Yaqiong, Huang, Guojun

arXiv.org Artificial IntelligenceDec-14-2024

Abstract: Timely detection and geolocation of events based on social data can provide critical information for applications such as crisis response and resource allocation. However, most existing methods are greatly affected by event detection errors, leading to insufficient geolocation accuracy. To this end, this paper proposes a novel end-to-end event geolocation method (GTOP) leveraging Hyperbolic space and toponym hierarchies. Specifically, the proposed method contains one event detection module and one geolocation module. The event detection module constructs a heterogeneous information networks based on social data, and then constructs a homogeneous message graph and combines it with the text and time feature of the message to learning initial features of nodes. Node features are updated in Hyperbolic space and then fed into a classifier for event detection. To reduce the geolocation error, this paper proposes a noise toponym filtering algorithm (HIST) based on the hierarchical structure of toponyms. HIST analyzes the hierarchical structure of toponyms mentioned in the event cluster, taking the highly frequent city-level locations as the coarsegrained locations for events. To further improve the geolocation accuracy, we propose a fine-grained pseudo toponyms generation algorithm (FIT) based on the output of HIST, and combine generated pseudo toponyms with filtered toponyms to locate events based on the geographic center points of the combined toponyms. Extensive experiments are conducted on the Chinese dataset constructed in this paper and another public English dataset. The experimental results show that the proposed method is superior to the state-of-the-art baselines.

data mining, machine learning, toponym, (19 more...)

arXiv.org Artificial Intelligence

2412.1087

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
South America > Brazil > Ceará > Fortaleza (0.04)
(15 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.68)
Law (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback