Goto

Collaborating Authors

 Geophysical Analysis & Survey


Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping

arXiv.org Artificial Intelligence

The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by estimating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a Digital Surface Model (DSM) and roof instance segmentation from lower resolution and single oblique views using deep learning models. Our models, trained on aligned satellite and aerial datasets, produce 25cm DSMs and roof segments. With ~1m DSM MAE on buildings, ~5deg roof pitch error and ~56% IOU on roof segmentation, they significantly enhance the Solar API's potential to promote solar adoption.


RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

arXiv.org Artificial Intelligence

Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often impractical. In this study, we propose a workflow that leverages large language models (LLMs) to generate multimodal datasets with semantically rich captions at scale from plain OpenStreetMap (OSM) data for images sourced from the Google Earth Engine (GEE) platform. This approach facilitates the generation of paired remote sensing data and can be readily scaled up using openly available data. Within this framework, we present RSTeller, a multimodal dataset comprising over 1 million RS images, each accompanied by multiple descriptive captions. Extensive experiments demonstrate that RSTeller enhances the performance of multiple existing vision language models for RS scene understanding through continual pre-training. Our methodology significantly reduces the manual effort and expertise needed for annotating remote sensing imagery while democratizing access to high-quality annotated data. This advancement fosters progress in visual language modeling and encourages broader participation in remote sensing research and applications. The RSTeller dataset is available at https://github.com/SlytherinGe/RSTeller.


Unified Deep Learning Model for Global Prediction of Aboveground Biomass, Canopy Height and Cover from High-Resolution, Multi-Sensor Satellite Imagery

arXiv.org Artificial Intelligence

Regular measurement of carbon stock in the world's forests is critical for carbon accounting and reporting under national and international climate initiatives, and for scientific research, but has been largely limited in scalability and temporal resolution due to a lack of ground based assessments. Increasing efforts have been made to address these challenges by incorporating remotely sensed data. We present a new methodology which uses multi-sensor, multi-spectral imagery at a resolution of 10 meters and a deep learning based model which unifies the prediction of above ground biomass density (AGBD), canopy height (CH), canopy cover (CC) as well as uncertainty estimations for all three quantities. The model is trained on millions of globally sampled GEDI-L2/L4 measurements. We validate the capability of our model by deploying it over the entire globe for the year 2023 as well as annually from 2016 to 2023 over selected areas. The model achieves a mean absolute error for AGBD (CH, CC) of 26.1 Mg/ha (3.7 m, 9.9 %) and a root mean squared error of 50.6 Mg/ha (5.4 m, 15.8 %) on a globally sampled test dataset, demonstrating a significant improvement over previously published results. We also report the model performance against independently collected ground measurements published in the literature, which show a high degree of correlation across varying conditions. We further show that our pre-trained model facilitates seamless transferability to other GEDI variables due to its multi-head architecture.


Novel Change Detection Framework in Remote Sensing Imagery Using Diffusion Models and Structural Similarity Index (SSIM)

arXiv.org Artificial Intelligence

Change detection is a crucial task in remote sensing, enabling the monitoring of environmental changes, urban growth, and disaster impact. Conventional change detection techniques, such as image differencing and ratioing, often struggle with noise and fail to capture complex variations in imagery. Recent advancements in machine learning, particularly generative models like diffusion models, offer new opportunities for enhancing change detection accuracy. In this paper, we propose a novel change detection framework that combines the strengths of Stable Diffusion models with the Structural Similarity Index (SSIM) to create robust and interpretable change maps. Our approach, named Diffusion Based Change Detector, is evaluated on both synthetic and real-world remote sensing datasets and compared with state-of-the-art methods. The results demonstrate that our method significantly outperforms traditional differencing techniques and recent deep learning-based methods, particularly in scenarios with complex changes and noise.


GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms

arXiv.org Artificial Intelligence

Recent advancements have focused on encoding urban spatial information into high-dimensional spaces, with notable efforts dedicated to integrating sociodemographic data and satellite imagery. These efforts have established foundational models in this field. However, the effective utilization of these spatial representations for urban forecasting applications remains under-explored. To address this gap, we introduce GeoTransformer, a novel structure that synergizes the Transformer architecture with geospatial statistics prior. GeoTransformer employs an innovative geospatial attention mechanism to incorporate extensive urban information and spatial dependencies into a unified predictive model. Specifically, we compute geospatial weighted attention scores between the target region and surrounding regions and leverage the integrated urban information for predictions. Extensive experiments on GDP and ride-share demand prediction tasks demonstrate that GeoTransformer significantly outperforms existing baseline models, showcasing its potential to enhance urban forecasting tasks.


Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

arXiv.org Artificial Intelligence

Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given the bi-temporal images, we first transform them into a video using interpolation operations. Then, a set of temporal encoders is adopted to extract the motion features from the obtained video for coarse-grained changed region prediction. Subsequently, we design a novel Coarse-grained Foregrounds Augmented Spatial Encoder module to integrate both global and local information. We also introduce a motion augmented strategy that leverages motion cues as an additional output to aggregate with the spatial features for improved results. Meanwhile, we feed the input image pairs into the ResNet to get the different features and also the spatial blocks for fine-grained feature learning. More importantly, we propose a mask augmented strategy that utilizes coarse-grained changed regions, incorporating them into the decoder blocks to enhance the final changed prediction. Extensive experiments conducted on multiple benchmark datasets fully validated the effectiveness of our proposed framework for remote sensing image change detection. The source code of this paper will be released on https://github.com/Event-AHU/CTM_Remote_Sensing_Change_Detection


SpectralEarth: Training Hyperspectral Foundation Models at Scale

arXiv.org Artificial Intelligence

Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models leveraging data from the Environmental Mapping and Analysis Program (EnMAP). SpectralEarth comprises 538,974 image patches covering 415,153 unique locations from more than 11,636 globally distributed EnMAP scenes spanning two years of archive. Additionally, 17.5% of these locations include multiple timestamps, enabling multi-temporal HSI analysis. Utilizing state-of-the-art self-supervised learning (SSL) algorithms, we pretrain a series of foundation models on SpectralEarth. We integrate a spectral adapter into classical vision backbones to accommodate the unique characteristics of HSI. In tandem, we construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation. Experimental results support the versatility of our models, showcasing their generalizability across different tasks and sensors. We also highlight computational efficiency during model fine-tuning. The dataset, models, and source code will be made publicly available.


Cross-View Geolocalization and Disaster Mapping with Street-View and VHR Satellite Imagery: A Case Study of Hurricane IAN

arXiv.org Artificial Intelligence

Nature disasters play a key role in shaping human-urban infrastructure interactions. Effective and efficient response to natural disasters is essential for building resilience and a sustainable urban environment. Two types of information are usually the most necessary and difficult to gather in disaster response. The first information is about disaster damage perception, which shows how badly people think that urban infrastructure has been damaged. The second information is geolocation awareness, which means how people whereabouts are made available. In this paper, we proposed a novel disaster mapping framework, namely CVDisaster, aiming at simultaneously addressing geolocalization and damage perception estimation using cross-view Street-View Imagery (SVI) and Very High-Resolution satellite imagery. CVDisaster consists of two cross-view models, where CVDisaster-Geoloc refers to a cross-view geolocalization model based on a contrastive learning objective with a Siamese ConvNeXt image encoder, and CVDisaster-Est is a cross-view classification model based on a Couple Global Context Vision Transformer (CGCViT). Taking Hurricane IAN as a case study, we evaluate the CVDisaster framework by creating a novel cross-view dataset (CVIAN) and conducting extensive experiments. As a result, we show that CVDisaster can achieve highly competitive performance (over 80% for geolocalization and 75% for damage perception estimation) with even limited fine-tuning efforts, which largely motivates future cross-view models and applications within a broader GeoAI research community. The data and code are publicly available at: https://github.com/tum-bgd/CVDisaster.


Clutter Classification Using Deep Learning in Multiple Stages

arXiv.org Artificial Intelligence

Path loss prediction for wireless communications is highly dependent on the local environment. Propagation models including clutter information have been shown to significantly increase model accuracy. This paper explores the application of deep learning to satellite imagery to identify environmental clutter types automatically. Recognizing these clutter types has numerous uses, but our main application is to use clutter information to enhance propagation prediction models. Knowing the type of obstruction (tree, building, and further classifications) can improve the prediction accuracy of key propagation metrics such as path loss.


Rethinking Feature Backbone Fine-tuning for Remote Sensing Object Detection

arXiv.org Artificial Intelligence

Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. For the remote sensing domain, a common practice among current detectors is to initialize the backbone with pre-training on ImageNet consisting of natural scenes. Fine-tuning the backbone is then typically required to generate features suitable for remote-sensing images. However, this could hinder the extraction of basic visual features in long-term training, thus restricting performance improvement. To mitigate this issue, we propose a novel method named DBF (Dynamic Backbone Freezing) for feature backbone fine-tuning on remote sensing object detection. Our method aims to handle the dilemma of whether the backbone should extract low-level generic features or possess specific knowledge of the remote sensing domain, by introducing a module called 'Freezing Scheduler' to dynamically manage the update of backbone features during training. Extensive experiments on DOTA and DIOR-R show that our approach enables more accurate model learning while substantially reducing computational costs. Our method can be seamlessly adopted without additional effort due to its straightforward design.