Goto

Collaborating Authors

 Geophysical Analysis & Survey


Weak Labeling for Cropland Mapping in Africa

arXiv.org Artificial Intelligence

If the goal is to achieve better results in specific regions, models Cropland mapping can play a vital role in addressing environmental, that are tailored to those regions usually perform better than agricultural, and food security challenges. However, models that are designed for the whole world. in the context of Africa, practical applications are often hindered To this end, we develop a modeling workflow for generating by the limited availability of high-resolution cropland high-resolution cropland maps that are tailored toward a maps. Such maps typically require extensive human labeling, given area of interest (AOI), using Kenya as a use case. We use thereby creating a scalability bottleneck. To address this, we a deep learning based semantic segmentation workflow - an approach propose an approach that utilizes unsupervised object clustering often employed for land-cover maps [9, 10, 11, 12, 13]. to refine existing weak labels, such as those obtained In order to train the models, we used a mixture of sparse human from global cropland maps. The refined labels, in conjunction labels gathered in the AOI and weak labels from global with sparse human annotations, serve as training data for a cropland maps. Specifically we use the area of intersection semantic segmentation network designed to identify cropland between an unsupervised object based clustering of the input areas. We conduct experiments to demonstrate the benefits of satellite imagery and the weak labels to mine stronger cropland the improved weak labels generated by our method. In a scenario (positive class) and non-cropland (negative class) samples (see where we train our model with only 33 human-annotated Figure 1 for an overview of this approach).


DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Change Detection

arXiv.org Artificial Intelligence

Remote sensing change detection is crucial for understanding the dynamics of our planet's surface, facilitating the monitoring of environmental changes, evaluating human impact, predicting future trends, and supporting decision-making. In this work, we introduce a novel approach for change detection that can leverage off-the-shelf, unlabeled remote sensing images in the training process by pre-training a Denoising Diffusion Probabilistic Model (DDPM) - a class of generative models used in image synthesis. DDPMs learn the training data distribution by gradually converting training images into a Gaussian distribution using a Markov chain. During inference (i.e., sampling), they can generate a diverse set of samples closer to the training distribution, starting from Gaussian noise, achieving state-of-the-art image synthesis results. However, in this work, our focus is not on image synthesis but on utilizing it as a pre-trained feature extractor for the downstream application of change detection. Specifically, we fine-tune a lightweight change classifier utilizing the feature representations produced by the pre-trained DDPM alongside change labels. Experiments conducted on the LEVIR-CD, WHU-CD, DSIFN-CD, and CDD datasets demonstrate that the proposed DDPM-CD method significantly outperforms the existing state-of-the-art change detection methods in terms of F1 score, IoU, and overall accuracy, highlighting the pivotal role of pre-trained DDPM as a feature extractor for downstream applications. We have made both the code and pre-trained models available at https://github.com/wgcban/ddpm-cd


BD-MSA: Body decouple VHR Remote Sensing Image Change Detection method guided by multi-scale feature information aggregation

arXiv.org Artificial Intelligence

The purpose of remote sensing image change detection (RSCD) is to detect differences between bi-temporal images taken at the same place. Deep learning has been extensively used to RSCD tasks, yielding significant results in terms of result recognition. However, due to the shooting angle of the satellite, the impacts of thin clouds, and certain lighting conditions, the problem of fuzzy edges in the change region in some remote sensing photographs cannot be properly handled using current RSCD algorithms. To solve this issue, we proposed a Body Decouple Multi-Scale by fearure Aggregation change detection (BD-MSA), a novel model that collects both global and local feature map information in the channel and space dimensions of the feature map during the training and prediction phases. This approach allows us to successfully extract the change region's boundary information while also divorcing the change region's main body from its boundary. Numerous studies have shown that the assessment metrics and evaluation effects of the model described in this paper on the publicly available datasets DSIFN-CD and S2Looking are the best when compared to other models.


Kronecker Product Feature Fusion for Convolutional Neural Network in Remote Sensing Scene Classification

arXiv.org Artificial Intelligence

Remote Sensing Scene Classification is a challenging and valuable research topic, in which Convolutional Neural Network (CNN) has played a crucial role. CNN can extract hierarchical convolutional features from remote sensing imagery, and Feature Fusion of different layers can enhance CNN's performance. Two successful Feature Fusion methods, Add and Concat, are employed in certain state-of-the-art CNN algorithms. In this paper, we propose a novel Feature Fusion algorithm, which unifies the aforementioned methods using the Kronecker Product (KPFF), and we discuss the Backpropagation procedure associated with this algorithm. To validate the efficacy of the proposed method, a series of experiments are designed and conducted. The results demonstrate its effectiveness of enhancing CNN's accuracy in Remote sensing scene classification.


A comprehensive survey of research towards AI-enabled unmanned aerial systems in pre-, active-, and post-wildfire management

arXiv.org Artificial Intelligence

Wildfires have emerged as one of the most destructive natural disasters worldwide, causing catastrophic losses in both human lives and forest wildlife. Recently, the use of Artificial Intelligence (AI) in wildfires, propelled by the integration of Unmanned Aerial Vehicles (UAVs) and deep learning models, has created an unprecedented momentum to implement and develop more effective wildfire management. Although some of the existing survey papers have explored various learning-based approaches, a comprehensive review emphasizing the application of AI-enabled UAV systems and their subsequent impact on multi-stage wildfire management is notably lacking. This survey aims to bridge these gaps by offering a systematic review of the recent state-of-the-art technologies, highlighting the advancements of UAV systems and AI models from pre-fire, through the active-fire stage, to post-fire management. To this aim, we provide an extensive analysis of the existing remote sensing systems with a particular focus on the UAV advancements, device specifications, and sensor technologies relevant to wildfire management. We also examine the pre-fire and post-fire management approaches, including fuel monitoring, prevention strategies, as well as evacuation planning, damage assessment, and operation strategies. Additionally, we review and summarize a wide range of computer vision techniques in active-fire management, with an emphasis on Machine Learning (ML), Reinforcement Learning (RL), and Deep Learning (DL) algorithms for wildfire classification, segmentation, detection, and monitoring tasks. Ultimately, we underscore the substantial advancement in wildfire modeling through the integration of cutting-edge AI techniques and UAV-based data, providing novel insights and enhanced predictive capabilities to understand dynamic wildfire behavior.


Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework

arXiv.org Artificial Intelligence

Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitive remote sensing images for further analysis. Additionally, low-resource remote sensing devices encounter challenges in terms of communication overhead and efficiency when dealing with the ever-increasing data and model scales. To solve the above challenges, this paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL. The proposed framework allows each client to learn global and local knowledge to enhance the local representation of private data in environments with extreme statistical heterogeneity (non. Independent and Identically Distributed, IID). Thus, it provides highly customized models to clients with differentiated data distributions. Moreover, the framework minimizes communication overhead and improves efficiency while ensuring satisfactory performance, thereby enhancing robustness and practical applicability under resource-scarce conditions. We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.


RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

arXiv.org Artificial Intelligence

Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain pre-trained Vision-Language Model (DVLM), bridging the gap between the General Vision-Language Model (GVLM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we fine-tuned the CLIP model and tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DVLM. Experimental results show that our proposed dataset is highly effective for various tasks, and our model GeoRSCLIP improves upon the baseline or previous state-of-the-art model by $3\%\sim20\%$ in Zero-shot Classification (ZSC), $3\%\sim6\%$ in Remote Sensing Cross-Modal Text-Image Retrieval (RSCTIR) and $4\%\sim5\%$ in Semantic Localization (SeLo) tasks. Dataset and models have been released in: \url{https://github.com/om-ai-lab/RS5M}.


AI Powered Road Network Prediction with Multi-Modal Data

arXiv.org Artificial Intelligence

This study presents an innovative approach for automatic road detection with deep learning, by employing fusion strategies for utilizing both lower-resolution satellite imagery and GPS trajectory data, a concept never explored before. We rigorously investigate both early and late fusion strategies, and assess deep learning based road detection performance using different fusion settings. Our extensive ablation studies assess the efficacy of our framework under diverse model architectures, loss functions, and geographic domains (Istanbul and Montreal). For an unbiased and complete evaluation of road detection results, we use both region-based and boundary-based evaluation metrics for road segmentation. The outcomes reveal that the ResUnet model outperforms U-Net and D-Linknet in road extraction tasks, achieving superior results over the benchmark study using low-resolution Sentinel-2 data. This research not only contributes to the field of automatic road detection but also offers novel insights into the utilization of data fusion methods in diverse applications.


Landslide Detection and Segmentation Using Remote Sensing Images and Deep Neural Network

arXiv.org Artificial Intelligence

Knowledge about historic landslide event occurrence is important for supporting disaster risk reduction strategies. Building upon findings from 2022 Landslide4Sense Competition, we propose a deep neural network based system for landslide detection and segmentation from multisource remote sensing image input. We use a U-Net trained with Cross Entropy loss as baseline model. We then improve the U-Net baseline model by leveraging a wide range of deep learning techniques. In particular, we conduct feature engineering by generating new band data from the original bands, which helps to enhance the quality of remote sensing image input. Regarding the network architecture, we replace traditional convolutional layers in the U-Net baseline by a residual-convolutional layer. We also propose an attention layer which leverages the multi-head attention scheme. Additionally, we generate multiple output masks with three different resolutions, which creates an ensemble of three outputs in the inference process to enhance the performance. Finally, we propose a combined loss function which leverages Focal loss and IoU loss to train the network. Our experiments on the development set of the Landslide4Sense challenge achieve an F1 score and an mIoU score of 84.07 and 76.07, respectively. Our best model setup outperforms the challenge baseline and the proposed U-Net baseline, improving the F1 score/mIoU score by 6.8/7.4 and 10.5/8.8, respectively.


GEO-Bench: Toward Foundation Models for Earth Monitoring

arXiv.org Artificial Intelligence

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.