Goto

Collaborating Authors

 geospatial foundation model


Utilizing a Geospatial Foundation Model for Coastline Delineation in Small Sandy Islands

Chhabra, Tishya, Bajpai, Manisha, Zesk, Walter, Tibbits, Skylar

arXiv.org Artificial Intelligence

We present an initial evaluation of NASA and IBM's Prithvi-EO-2.0 geospatial foundation model on shoreline delineation of small sandy islands using satellite images. We curated and labeled a dataset of 225 multispectral images of two Maldivian islands, which we publicly release, and fine-tuned both the 300M and 600M parameter versions of Prithvi on training subsets ranging from 5 to 181 images. Our experiments show that even with as few as 5 training images, the models achieve high performance (F1 of 0.94, IoU of 0.79). Our results demonstrate the strong transfer learning capability of Prithvi, underscoring the potential of such models to support coastal monitoring in data-poor regions.


ZeroFlood: A Geospatial Foundation Model for Data-Efficient Flood Susceptibility Mapping

Kim, Hyeongkyun, Oikonomou, Orestis

arXiv.org Artificial Intelligence

Flood susceptibility mapping (FSM) is vital for disaster prevention but remains challenging in data-scarce regions where hydrodynamic models require dense geophysical inputs. This work introduces ZeroFlood, a geospatial foundation model framework for data-efficient FSM. The approach fine-tunes Geospatial Foundation Models (GFMs) with Thinking-in-Modality (TiM) reasoning, enabling flood prediction from basic Earth observation data such as Sentinel-1 or Sentinel-2 imagery. Using paired EO and simulated flood maps from data-rich regions, ZeroFlood bridges data availability gaps through cross-modal representation learning. Experiments with TerraMind and Prithvi GFMs show that TiM enhances model robustness, with the TerraMind-Large configuration achieving an F1 score of 67.21. The results demonstrate the feasibility of foundation-model-based FSM as a scalable and data-efficient solution for flood risk management.

  Country:
  Genre: Research Report > New Finding (0.48)

Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning

Liu, Junyuan, Qin, Quan, Dong, Guangsheng, Wang, Xinglei, Feng, Jiazhuang, Zeng, Zichao, Cheng, Tao

arXiv.org Artificial Intelligence

General-purpose spatial representations are essential for building transferable geospatial foundation models (GFMs). Among them, the AlphaEarth Foundation (AE) represents a major step toward a global, unified representation of the Earth's surface, learning 10-meter embeddings from multi-source Earth Observation (EO) data that capture rich physical and environmental patterns across diverse landscapes. However, such EO-driven representations remain limited in capturing the functional and socioeconomic dimensions of cities, as they primarily encode physical and spectral patterns rather than human activities or spatial functions. We propose AETHER (AlphaEarth-POI Enriched Representation Learning), a lightweight framework that adapts AlphaEarth to human-centered urban analysis through multimodal alignment guided by Points of Interest (POIs). AETHER aligns AE embeddings with textual representations of POIs, enriching physically grounded EO features with semantic cues about urban functions and socioeconomic contexts. In Greater London, AETHER achieves consistent gains over the AE baseline, with a 7.2% relative improvement in land-use classification F1 and a 23.6% relative reduction in Kullback-Leibler divergence for socioeconomic mapping. Built upon pretrained AE, AETHER leverages a lightweight multimodal alignment to enrich it with human-centered semantics while remaining computationally efficient and scalable for urban applications. By coupling EO with human-centered semantics, it advances geospatial foundation models toward general-purpose urban representations that integrate both physical form and functional meaning. Introduction Understanding the spatial organization and functional dynamics of cities remains a long-standing challenge in GIScience and urban computing. Addressing this challenge requires spatial representations that generalize across scales, modalities, and urban contexts.


AI-Derived Structural Building Intelligence for Urban Resilience: An Application in Saint Vincent and the Grenadines

Tingzon, Isabelle, Toriumi, Yoji, Gevaert, Caroline

arXiv.org Artificial Intelligence

Detailed structural building information is used to estimate potential damage from hazard events like cyclones, floods, and landslides, making them critical for urban resilience planning and disaster risk reduction. However, such information is often unavailable in many small island developing states (SIDS) in climate-vulnerable regions like the Caribbean. T o address this data gap, we present an AIdriven workflow to automatically infer rooftop attributes from high-resolution satellite imagery, with Saint Vincent and the Grenadines as our case study. Here, we compare the utility of geospatial foundation models combined with shallow classifiers against fine-tuned deep learning models for rooftop classification. Furthermore, we assess the impact of incorporating additional training data from neighboring SIDS to improve model performance. Our best models achieve F1 scores of 0.88 and 0.83 for roof pitch and roof material classification, respectively. Combined with local capacity building, our work aims to provide SIDS with novel capabilities to harness AI and Earth Observation (EO) data to enable more efficient, evidence-based urban governance.


Detection and Simulation of Urban Heat Islands Using a Fine-Tuned Geospatial Foundation Model

Kreismann, David

arXiv.org Artificial Intelligence

As urbanization and climate change progress, urban heat island effects are becoming more frequent and severe. To formulate effective mitigation plans, cities require detailed air temperature data. However, predictive analytics methods based on conventional machine learning models and limited data infrastructure often provide inaccurate predictions, especially in underserved areas. In this context, geospatial foundation models trained on unstructured global data demonstrate strong generalization and require minimal fine-tuning, offering an alternative for predictions where traditional approaches are limited. This study fine-tunes a geospatial foundation model to predict urban land surface temperatures under future climate scenarios and explores its response to land cover changes using simulated vegetation strategies. The fine-tuned model achieved pixel-wise downscaling errors below 1.74 °C and aligned with ground truth patterns, demonstrating an extrapolation capacity up to 3.62 °C.


Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

Si, Haozhe, Wan, Yuxuan, Do, Minh, Vasisht, Deepak, Zhao, Han, Hamann, Hendrik F.

arXiv.org Artificial Intelligence

Geospatial raster data, such as that collected by satellite-based imaging systems at different times and spectral bands, hold immense potential for enabling a wide range of high-impact applications. This potential stems from the rich information that is spatially and temporally contextualized across multiple channels and sensing modalities. Recent work has adapted existing self-supervised learning approaches for such geospatial data. However, they fall short of scalable model architectures, leading to inflexibility and computational inefficiencies when faced with an increasing number of channels and modalities. To address these limitations, we introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations: i) the LESS Attention Block that approximates high-dimensional spatial-spectral attention through Kronecker's product of the low-dimensional spatial and spectral attention components; ii) the Continuous Positional-Channel Embedding Layer that preserves both the continuity and physical characteristics of each spatial-spectral patch; and iii) the Perception Field Mask that exploits local spatial dependencies by constraining attention to neighboring patches. To evaluate the proposed innovations, we construct GFM-Bench, which serves as a comprehensive benchmark for such geospatial raster data. We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies. Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models while outperforming them on cross-satellite generalization tasks with higher computational efficiency. The flexibility and extensibility of our framework make it a promising direction for future geospatial data analysis tasks that involve a wide range of modalities and channels.


Fine-tuning of Geospatial Foundation Models for Aboveground Biomass Estimation

Muszynski, Michal, Klein, Levente, da Silva, Ademir Ferreira, Atluri, Anjani Prasad, Gomes, Carlos, Szwarcman, Daniela, Singh, Gurkanwar, Gu, Kewen, Zortea, Maciel, Simumba, Naomi, Fraccaro, Paolo, Singh, Shraddha, Meliksetian, Steve, Watson, Campbell, Kimura, Daiki, Srinivasan, Harini

arXiv.org Artificial Intelligence

Global vegetation structure mapping is critical for understanding the global carbon cycle and maximizing the efficacy of nature-based carbon sequestration initiatives. Moreover, vegetation structure mapping can help reduce the impacts of climate change by, for example, guiding actions to improve water security, increase biodiversity and reduce flood risk. Global satellite measurements provide an important set of observations for monitoring and managing deforestation and degradation of existing forests, natural forest regeneration, reforestation, biodiversity restoration, and the implementation of sustainable agricultural practices. In this paper, we explore the effectiveness of fine-tuning of a geospatial foundation model to estimate above-ground biomass (AGB) using space-borne data collected across different eco-regions in Brazil. The fine-tuned model architecture consisted of a Swin-B transformer as the encoder (i.e., backbone) and a single convolutional layer for the decoder head. All results were compared to a U-Net which was trained as the baseline model Experimental results of this sparse-label prediction task demonstrate that the fine-tuned geospatial foundation model with a frozen encoder has comparable performance to a U-Net trained from scratch. This is despite the fine-tuned model having 13 times less parameters requiring optimization, which saves both time and compute resources. Further, we explore the transfer-learning capabilities of the geospatial foundation models by fine-tuning on satellite imagery with sparse labels from different eco-regions in Brazil.


GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping

Daruna, Angel, Zadorozhnyy, Vasily, Lukoczki, Georgina, Chiu, Han-Pang

arXiv.org Artificial Intelligence

Machine Learning (ML) for Mineral Prospectivity Mapping (MPM) remains a challenging problem as it requires the analysis of associations between large-scale multi-modal geospatial data and few historical mineral commodity observations (positive labels). Recent MPM works have explored Deep Learning (DL) as a modeling tool with more representation capacity. However, these overparameterized methods may be more prone to overfitting due to their reliance on scarce labeled data. While a large quantity of unlabeled geospatial data exists, no prior MPM works have considered using such information in a self-supervised manner. Our MPM approach uses a masked image modeling framework to pretrain a backbone neural network in a self-supervised manner using unlabeled geospatial data alone. After pretraining, the backbone network provides feature extraction for downstream MPM tasks. We evaluated our approach alongside existing methods to assess mineral prospectivity of Mississippi Valley Type (MVT) and Clastic-Dominated (CD) Lead-Zinc deposits in North America and Australia. Our results demonstrate that self-supervision promotes robustness in learned features, improving prospectivity predictions. Additionally, we leverage explainable artificial intelligence techniques to demonstrate that individual predictions can be interpreted from a geological perspective.