geolocation
GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents
Zhang, Xinyu, Wu, Yixin, Zhang, Boyang, Lin, Chenhao, Shen, Chao, Backes, Michael, Zhang, Yang
Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (L VLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image ge-olocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (L VLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe (0.04)
- (3 more...)
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Jakubik, Johannes, Yang, Felix, Blumenstiel, Benedikt, Scheurer, Erik, Sedona, Rocco, Maurogiovanni, Stefano, Bosmans, Jente, Dionelis, Nikolaos, Marsocci, Valerio, Kopp, Niklas, Ramachandran, Rahul, Fraccaro, Paolo, Brunschwiler, Thomas, Cavallaro, Gabriele, Bernabe-Moreno, Juan, Longépé, Nicolas
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- Asia > Singapore (0.04)
- (12 more...)
Validating Terrain Models in Digital Twins for Trustworthy sUAS Operations
Bernal, Arturo Miguel Russell, Petterson, Maureen, Granadeno, Pedro Antonio Alarcon, Murphy, Michael, Mason, James, Cleland-Huang, Jane
--With the increasing deployment of small Unmanned Aircraft Systems (sUAS) in unfamiliar and complex environments, Environmental Digital Twins (EDT) that comprise weather, airspace, and terrain data are critical for safe flight planning and for maintaining appropriate altitudes during search and surveillance operations. With the expansion of sUAS capabilities through edge and cloud computing, accurate EDT are also vital for advanced sUAS capabilities, like geolocation. However, real-world sUAS deployment introduces significant sources of uncertainty, necessitating a robust validation process for EDT components. This paper focuses on the validation of terrain models, one of the key components of an EDT, for real-world sUAS tasks. These models are constructed by fusing U.S. Geological Survey (USGS) datasets and satellite imagery, incorporating high-resolution environmental data to support mission tasks. V alidating both the terrain models and their operational use by sUAS under real-world conditions presents significant challenges, including limited data granularity, terrain discontinuities, GPS and sensor inaccuracies, visual detection uncertainties, as well as onboard resources and timing constraints. We propose a 3-Dimensions validation process grounded in software engineering principles, following a workflow across granularity of tests, simulation to real world, and the analysis of simple to edge conditions. We demonstrate our approach using a multi-sUAS platform equipped with a T errain-A ware Digital Shadow. As swarms of small Unmanned Aircraft Systems (sUAS) are increasingly deployed in complex, unstructured environments such as disaster zones, wilderness areas, and wildfire regions, the need for accurate environmental models becomes critical. Effective sUAS mission planning requires awareness not only of dynamic airspace and weather conditions but also of the underlying terrain. In such settings, terrain is often the dominant factor influencing flight safety, sensor placement, line-of-sight communications, and search effectiveness. This paper focuses specifically on the role of terrain models that enable mission-level decision-making and flight planning for sUAS operations. However, terrain inaccuracies or blind spots, such as missing elevation data, undetected peaks, or mismatched georeferencing, can result in ineffective or even hazardous behavior by autonomous vehicles. To minimize these issues, we construct and maintain a terrain model by fusing multiple sources of environmental data, including public USGS datasets [1], [2], and satellite imagery [3].
- North America > United States > Indiana > Saint Joseph County > South Bend (0.05)
- North America > United States > Michigan (0.04)
- North America > United States > Oklahoma (0.04)
- (2 more...)
- Research Report (0.50)
- Workflow (0.48)
- Transportation > Air (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Aerospace & Defense > Aircraft (0.86)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.54)
- North America > United States > Indiana (0.05)
- South America (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- (4 more...)
- Health & Medicine > Consumer Health (1.00)
- Leisure & Entertainment > Sports > Track & Field (0.94)
- Consumer Products & Services (0.94)
Audio Geolocation: A Natural Sounds Benchmark
Chasmai, Mustafa, Liu, Wuao, Maji, Subhransu, Van Horn, Grant
Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.
- Leisure & Entertainment (1.00)
- Media > Film (0.93)
- Information Technology > Artificial Intelligence > Speech (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Communications > Social Media (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models
Accurately determining the geographic location where a single image was taken, visual geolocation, remains a formidable challenge due to the planet's vastness and the deceptive similarity among distant locations. We introduce GeoLocSFT, a framework that demonstrates how targeted supervised fine-tuning (SFT) of a large multimodal foundation model (Gemma 3) using a small, high-quality dataset can yield highly competitive geolocation performance. GeoLocSFT is trained with only 2700 carefully selected image-GPS pairs from our geographically diverse MR600k dataset. Despite this limited data, our SFT-centric approach substantially improves over baseline models and achieves robust results on standard benchmarks such as Im2GPS-3k and YFCC-4k, as well as on our newly proposed and challenging MR40k benchmark, aimed specifically at sparsely populated regions. Further, we explore multi-candidate inference and aggregation strategies but find that the core gains are already realized at the SFT stage. Our findings highlight the power of high-quality supervision and efficient SFT for planet-scale image geolocation, especially when compared to prior methods that require massive databases or complex pipelines. To foster further research, we publicly release the MR40k benchmark dataset.
- Europe > Austria (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Europe > Spain (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
Jay, Neel, Nguyen, Hieu Minh, Hoang, Trung Dung, Haimes, Jacob
The prevalence of Vision-Language Models (VLMs) raises important questions about privacy in an era where visual information is increasingly available. While foundation VLMs demonstrate broad knowledge and learned capabilities, we specifically investigate their ability to infer geographic location from previously unseen image data. This paper introduces a benchmark dataset collected from Google Street View that represents its global distribution of coverage. Foundation models are evaluated on single-image geolocation inference, with many achieving median distance errors of <300 km. We further evaluate VLM "agents" with access to supplemental tools, observing up to a 30.6% decrease in distance error. Our findings establish that modern foundation VLMs can act as powerful image geolocation tools, without being specifically trained for this task. When coupled with increasing accessibility of these models, our findings have greater implications for online privacy. We discuss these risks, as well as future work in this area.
- Europe > United Kingdom > England (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- South America > Brazil (0.04)
- (11 more...)
HMCGeo: IP Region Prediction Based on Hierarchical Multi-label Classification
Zhao, Tianzi, Liu, Xinran, Zhang, Zhaoxin, Zhao, Dong, Li, Ning, Zhang, Zhichao, Wang, Xinye
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China Emails: {23b903088, zhangzhaoxin, 22s030153, li.ning, 22b303010}@stu.hit.edu.cn School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China Email: xinran_Liu@bupt.edu.cn Abstract --Fine-grained IP geolocation plays a critical role in applications such as location-based services and cybersecurity. Most existing fine-grained IP geolocation methods are regression-based; however, due to noise in the input data, these methods typically encounter kilometer-level prediction errors and provide incorrect region information for users. T o address this issue, this paper proposes a novel hierarchical multi-label classification framework for IP region prediction, named HMCGeo. This framework treats IP geolocation as a hierarchical multi-label classification problem and employs residual connection-based feature extraction and attention prediction units to predict the target host region across multiple geographical granularities. Furthermore, we introduce probabilistic classification loss during training, combining it with hierarchical cross-entropy loss to form a composite loss function. IP region prediction experiments on the New Y ork, Los Angeles, and Shanghai datasets demonstrate that HMCGeo achieves superior performance across all geographical granularities, significantly outperforming existing IP geolocation methods. P geolocation is a technique used to predict the geographical location of a host based on its IP address [1], playing a crucial role in location-based services, network topology optimization, and cybersecurity [2], [3], [4], [5], [6], [7], [8]. Using IP geolocation technology, online services and applications infer the geographical location of users to deliver localized weather updates, news, and event notifications [3]. Internet service providers (ISPs) estimate the approximate location of target hosts to optimize traffic transmission paths, reduce network latency, and improve transmission efficiency [4]. Network analysts examine the geographical origins of incoming traffic to assess security threats from suspicious addresses. This research was supported by the National Key R&D Program of China (2024QY1103, 2018YFB18002). Based on the accuracy of prediction results, IP geolocation is categorized into coarse-grained and fine-grained geolocation. Coarse-grained IP geolocation predicts the location of a target host by utilizing allocation information such as Autonomous System Numbers (ASN), ISP, and BGP, or by analyzing the relationship between latency and distance. These methods construct geolocation databases that provide location information at the country or city level. Building on this foundation, fine-grained IP geolocation reduces prediction errors to a few kilometers in certain regions by leveraging richer landmarks or employing more effective prediction methods.
- Asia > China > Heilongjiang Province > Harbin (0.44)
- Asia > China > Beijing > Beijing (0.44)
- Asia > China > Shanghai > Shanghai (0.26)
- North America > United States > California > Los Angeles County > Los Angeles (0.25)
A Novel End-To-End Event Geolocation Method Leveraging Hyperbolic Space and Toponym Hierarchies
Abstract: Timely detection and geolocation of events based on social data can provide critical information for applications such as crisis response and resource allocation. However, most existing methods are greatly affected by event detection errors, leading to insufficient geolocation accuracy. To this end, this paper proposes a novel end-to-end event geolocation method (GTOP) leveraging Hyperbolic space and toponym hierarchies. Specifically, the proposed method contains one event detection module and one geolocation module. The event detection module constructs a heterogeneous information networks based on social data, and then constructs a homogeneous message graph and combines it with the text and time feature of the message to learning initial features of nodes. Node features are updated in Hyperbolic space and then fed into a classifier for event detection. To reduce the geolocation error, this paper proposes a noise toponym filtering algorithm (HIST) based on the hierarchical structure of toponyms. HIST analyzes the hierarchical structure of toponyms mentioned in the event cluster, taking the highly frequent city-level locations as the coarsegrained locations for events. To further improve the geolocation accuracy, we propose a fine-grained pseudo toponyms generation algorithm (FIT) based on the output of HIST, and combine generated pseudo toponyms with filtered toponyms to locate events based on the geographic center points of the combined toponyms. Extensive experiments are conducted on the Chinese dataset constructed in this paper and another public English dataset. The experimental results show that the proposed method is superior to the state-of-the-art baselines.
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- South America > Brazil > Ceará > Fortaleza (0.04)
- (15 more...)
- Information Technology (0.68)
- Law (0.68)
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Dufour, Nicolas, Picard, David, Kalogeiton, Vicky, Landrieu, Loic
Global visual geolocation predicts where an image was captured on Earth. Since images vary in how precisely they can be localized, this task inherently involves a significant degree of ambiguity. However, existing approaches are deterministic and overlook this aspect. In this paper, we aim to close the gap between traditional geolocalization and modern generative methods. We propose the first generative geolocation approach based on diffusion and Riemannian flow matching, where the denoising process operates directly on the Earth's surface. Our model achieves state-of-the-art performance on three visual geolocation benchmarks: OpenStreetView-5M, YFCC-100M, and iNat21. In addition, we introduce the task of probabilistic visual geolocation, where the model predicts a probability distribution over all possible locations instead of a single point. We introduce new metrics and baselines for this task, demonstrating the advantages of our diffusion-based approach. Codes and models will be made available.
- Oceania > Australia (0.04)
- North America > United States > Maryland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)