Goto

Collaborating Authors

 Geophysical Analysis & Survey


Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit

arXiv.org Artificial Intelligence

As the development of lightweight deep learning algorithms, various deep neural network (DNN) models have been proposed for the remote sensing scene classification (RSSC) application. However, it is still challenging for these RSSC models to achieve optimal performance among model accuracy, inference latency, and energy consumption on resource-constrained edge devices. In this paper, we propose a lightweight RSSC framework, which includes a distilled global filter network (GFNet) model and an early-exit mechanism designed for edge devices to achieve state-of-the-art performance. Specifically, we first apply frequency domain distillation on the GFNet model to reduce model size. Then we design a dynamic early-exit model tailored for DNN models on edge devices to further improve model inference efficiency. We evaluate our E3C model on three edge devices across four datasets. Extensive experimental results show that it achieves an average of 1.3x speedup on model inference and over 40% improvement on energy efficiency, while maintaining high classification accuracy.


A Novel Monte-Carlo Compressed Sensing and Dictionary Learning Method for the Efficient Path Planning of Remote Sensing Robots

arXiv.org Artificial Intelligence

In recent years, Compressed Sensing (CS) has gained significant interest as a technique for acquiring high-resolution sensory data using fewer measurements than traditional Nyquist sampling requires. At the same time, autonomous robotic platforms such as drones and rovers have become increasingly popular tools for remote sensing and environmental monitoring tasks, including measurements of temperature, humidity, and air quality. Within this context, this paper presents, to the best of our knowledge, the first investigation into how the structure of CS measurement matrices can be exploited to design optimized sampling trajectories for robotic environmental data collection. We propose a novel Monte Carlo optimization framework that generates measurement matrices designed to minimize both the robot's traversal path length and the signal reconstruction error within the CS framework. Central to our approach is the application of Dictionary Learning (DL) to obtain a data-driven sparsifying transform, which enhances reconstruction accuracy while further reducing the number of samples that the robot needs to collect. We demonstrate the effectiveness of our method through experiments reconstructing $NO_2$ pollution maps over the Gulf region. The results indicate that our approach can reduce robot travel distance to less than $10\%$ of a full-coverage path, while improving reconstruction accuracy by over a factor of five compared to traditional CS methods based on DCT and polynomial dictionaries, as well as by a factor of two compared to previously-proposed Informative Path Planning (IPP) methods.


A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

arXiv.org Artificial Intelligence

The Tibetan Plateau, known as the Asian Water Tower, faces significant water security challenges due to its high sensitivity to climate change. Advancing Earth observation for sustainable water monitoring is thus essential for building climate resilience in this region. This study proposes a two-stage transfer learning strategy using the SegFormer model to overcome domain shift and data scarcit--key barriers in developing robust AI for climate-sensitive applications. After pre-training on a diverse source domain, our model was fine-tuned for the arid Zhada Tulin area. Experimental results show a substantial performance boost: the Intersection over Union (IoU) for water body segmentation surged from 25.50% (direct transfer) to 64.84%. This AI-driven accuracy is crucial for disaster risk reduction, particularly in monitoring flash flood-prone systems. More importantly, the high-precision map reveals a highly concentrated spatial distribution of water, with over 80% of the water area confined to less than 20% of the river channel length. This quantitative finding provides crucial evidence for understanding hydrological processes and designing targeted water management and climate adaptation strategies. Our work thus demonstrates an effective technical solution for monitoring arid plateau regions and contributes to advancing AI-powered Earth observation for disaster preparedness in critical transboundary river headwaters.


Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

arXiv.org Artificial Intelligence

Land Use Land Cover (LULC) mapping is essential for urban and resource planning, and is one of the key elements in developing smart and sustainable cities.This study evaluates advanced LULC mapping techniques, focusing on Look-Up Table (LUT)-based Atmospheric Correction applied to Cartosat Multispectral (MX) sensor images, followed by supervised and semi-supervised learning models for LULC prediction. We explore DeeplabV3+ and Cross-Pseudo Supervision (CPS). The CPS model is further refined with dynamic weighting, enhancing pseudo-label reliability during training. This comprehensive approach analyses the accuracy and utility of LULC mapping techniques for various urban planning applications. A case study of Hyderabad, India, illustrates significant land use changes due to rapid urbanization. By analyzing Cartosat MX images over time, we highlight shifts such as urban sprawl, shrinking green spaces, and expanding industrial areas. This demonstrates the practical utility of these techniques for urban planners and policymakers.


Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

arXiv.org Artificial Intelligence

We propose a vision transformer (ViT)-based deep learning framework to refine disaster-affected area segmentation from remote sensing imagery, aiming to support and enhance the Emergent Value Added Product (EVAP) developed by the Taiwan Space Agency (TASA). The process starts with a small set of manually annotated regions. We then apply principal component analysis (PCA)-based feature space analysis and construct a confidence index (CI) to expand these labels, producing a weakly supervised training set. These expanded labels are then used to train ViT-based encoder-decoder models with multi-band inputs from Sentinel-2 and Formosat-5 imagery. Our architecture supports multiple decoder variants and multi-stage loss strategies to improve performance under limited supervision. During the evaluation, model predictions are compared with higher-resolution EVAP output to assess spatial coherence and segmentation consistency. Case studies on the 2022 Poyang Lake drought and the 2023 Rhodes wildfire demonstrate that our framework improves the smoothness and reliability of segmentation results, offering a scalable approach for disaster mapping when accurate ground truth is unavailable.


HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery

arXiv.org Artificial Intelligence

With the increasing resolution of remote sensing imagery (RSI), large-size RSI has emerged as a vital data source for high-precision vector mapping of geographic objects. Existing methods are typically constrained to processing small image patches, which often leads to the loss of contextual information and produces fragmented vector outputs. To address these, this paper introduces HoliTracer, the first framework designed to holistically extract vectorized geographic objects from large-size RSI. In HoliTracer, we enhance segmentation of large-size RSI using the Context Attention Net (CAN), which employs a local-to-global attention mechanism to capture contextual dependencies. Furthermore, we achieve holistic vectorization through a robust pipeline that leverages the Mask Contour Reformer (MCR) to reconstruct polygons and the Polygon Sequence Tracer (PST) to trace vertices. Extensive experiments on large-size RSI datasets, including buildings, water bodies, and roads, demonstrate that HoliTracer outperforms state-of-the-art methods. Our code and data are available in https://github.com/vvangfaye/HoliTracer.


Using Multiple Input Modalities Can Improve Data-Efficiency and O.O.D. Generalization for ML with Satellite Imagery

arXiv.org Artificial Intelligence

A large variety of geospatial data layers is available around the world ranging from remotely-sensed raster data like satellite imagery, digital elevation models, predicted land cover maps, and human-annotated data, to data derived from environmental sensors such as air temperature or wind speed data. A large majority of machine learning models trained on satellite imagery (SatML), however, are designed primarily for optical input modalities such as multi-spectral satellite imagery. To better understand the value of using other input modalities alongside optical imagery in supervised learning settings, we generate augmented versions of SatML benchmark tasks by appending additional geographic data layers to datasets spanning classification, regression, and segmentation. Using these augmented datasets, we find that fusing additional geographic inputs with optical imagery can significantly improve SatML model performance. Benefits are largest in settings where labeled data are limited and in geographic out-of-sample settings, suggesting that multi-modal inputs may be especially valuable for data-efficiency and out-of-sample performance of SatML models. Surprisingly, we find that hard-coded fusion strategies outperform learned variants, with interesting implications for future work.


Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery

arXiv.org Artificial Intelligence

Protected natural areas play a vital role in ecological balance and ecosystem services. Monitoring these regions at scale using satellite imagery and machine learning is promising, but current methods often lack interpretability and uncertainty-awareness, and do not address how uncertainty affects naturalness assessment. In contrast, we propose Confidence-Filtered Relevance (CFR), a data-centric framework that combines LRP Attention Rollout with Deep Deterministic Uncertainty (DDU) estimation to analyze how model uncertainty influences the interpretability of relevance heatmaps. CFR partitions the dataset into subsets based on uncertainty thresholds, enabling systematic analysis of how uncertainty shapes the explanations of naturalness in satellite imagery. Applied to the AnthroProtect dataset, CFR assigned higher relevance to shrublands, forests, and wetlands, aligning with other research on naturalness assessment. Moreover, our analysis shows that as uncertainty increases, the interpretability of these relevance heatmaps declines and their entropy grows, indicating less selective and more ambiguous attributions. CFR provides a data-centric approach to assess the relevance of patterns to naturalness in satellite imagery based on their associated certainty.


SEMT: Static-Expansion-Mesh Transformer Network Architecture for Remote Sensing Image Captioning

arXiv.org Artificial Intelligence

-- Image captioning has emerged as a crucial task in the intersection of computer vision and natural language processing, enabling automated generation of descriptive text from visual content. In the context of remote sensing, image captioning plays a significant role in interpreting vast and complex satellite imagery, aiding applications such as environmental monitoring, disaster assessment, and urban planning. This motivates us, in this paper, to present a transformer based network architecture for remote sensing image captioning (RSIC) in which multiple techniques of Static Expansion, Memory-Augmented Self-Attention, Mesh Transformer are evaluated and integrated. We evaluate our proposed models using two benchmark remote sensing image datasets of UCM-Caption and NWPU-Caption. Our best model outperforms the state-of-the-art systems on most of evaluation metrics, which demonstrates potential to apply for real-life remote sensing image systems.


GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

arXiv.org Artificial Intelligence

Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.