Goto

Collaborating Authors

 Geophysical Analysis & Survey


Integrating Traditional and Deep Learning Methods to Detect Tree Crowns in Satellite Images

arXiv.org Artificial Intelligence

Global warming, loss of biodiversity, and air pollution are among the most significant problems facing Earth. One of the primary challenges in addressing these issues is the lack of monitoring forests to protect them. To tackle this problem, it is important to leverage remote sensing and computer vision methods to automate monitoring applications. Hence, automatic tree crown detection algorithms emerged based on traditional and deep learning methods. In this study, we first introduce two different tree crown detection methods based on these approaches. Then, we form a novel rule-based approach that integrates these two methods to enhance robustness and accuracy of tree crown detection results. While traditional methods are employed for feature extraction and segmentation of forested areas, deep learning methods are used to detect tree crowns in our method. With the proposed rule-based approach, we post-process these results, aiming to increase the number of detected tree crowns through neighboring trees and localized operations. We compare the obtained results with the proposed method in terms of the number of detected tree crowns and report the advantages, disadvantages, and areas for improvement of the obtained outcomes.


Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data

arXiv.org Artificial Intelligence

Scene categorization (SC) in remotely acquired images is an important subject with broad consequences in different fields, including catastrophe control, ecological observation, architecture for cities, and more. Nevertheless, its several apps, reaching a high degree of accuracy in SC from distant observation data has demonstrated to be difficult. This is because traditional conventional deep learning models require large databases with high variety and high levels of noise to capture impor tant visual featu res. To address these problems, this investigation file introduces an innovative technique referred to as the Cuttlefish Optimized Bidirectional Recurrent Neural Network (CO - BRNN) for type of scenes in remote sensing data. The investigation compares the execution of CO - BRNN with current techniques, including Multilayer Perceptron - Convolutional Neural Network (MLP - CNN), Convolutional Neural Network - Long Short Term Memory ( CNN - LSTM), and Long Short Term Memory - Conditional Random Field (LSTM - CRF), Graph - Based (GB), Multilabel Image Retrieval Model (MIRM - CF), Convolutional Neural Networks Data Augmentation (CNN - DA). The results demonstrate that CO - BRNN attained the maximum accuracy of 97%, followed by LSTM - CRF with 90%, MLP - CNN with 85%, and CNN - LSTM with 80%. The study highlights the significance of physical confirmation to ensure the efficiency of satellite data.


Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data

arXiv.org Artificial Intelligence

This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, leveraging high-resolution satellite imagery together with conventional tra ffic data from local sensors. To extract information from imagery data, we design a computer vision pipeline for class-specific vehicle detection and map matching, generating link-level tra ffic density observations by vehicle class. Building upon this information, we formulate a computational graph-based DODE model that calibrates dynamic network states by jointly matching observed tra ffic counts and travel times from local sensors with density measurements derived from satellite imagery. To assess the accuracy and scalability of the proposed framework, we conduct a series of numerical experiments using both synthetic and real-world data. The results of out-of-sample tests demonstrate that supplementing traditional data with satellite-derived density significantly improves estimation performance, especially for links without local sensors. Real-world experiments also confirm the framework's capability to handle large-scale networks, supporting its potential for practical deployment in cities of varying sizes. Sensitivity analysis further evaluates the impact of data quality related to satellite imagery data. Introduction The widespread availability of spatio-temporal data has created new opportunities for advancing computational tools to model network flows, individual traveler behavior, and travel demand in dynamic transportation networks. Recent developments in sensing technologies and artificial intelligence are revolutionizing traditional models, making them more data-driven, scalable, and e ff ective for complex, large-scale networks. Dynamic Origin-destination Demand Estimation (DODE) is a foundational prerequisite for dynamic network models to accurately reproduce the status quo spatio-temporal network conditions, supporting tra ffic assignment (Pi et al. 2019) and control strategies (Y e et al. 2019, Liu, Ma & Qian 2023, Ke et al. 2025). DODE studies can be broadly categorized into model-based methods, which embed physics-informed tra ffic assignment models, and model-free methods, which formulate the problem using data-driven techniques without tra ffic assignment constraints.


Metadata, Wavelet, and Time Aware Diffusion Models for Satellite Image Super Resolution

arXiv.org Artificial Intelligence

The acquisition of high-resolution satellite imagery is often constrained by the spatial and temporal limitations of satellite sensors, as well as the high costs associated with frequent observations. These challenges hinder applications such as environmental monitoring, disaster response, and agricultural management, which require fine-grained and high-resolution data. In this paper, we propose MWT-Diff, an innovative framework for satellite image super-resolution (SR) that combines latent diffusion models with wavelet transforms to address these challenges. At the core of the framework is a novel metadata-, wavelet-, and time-aware encoder (MWT-Encoder), which generates embeddings that capture metadata attributes, multi-scale frequency information, and temporal relationships. The embedded feature representations steer the hierarchical diffusion dynamics, through which the model progressively reconstructs high-resolution satellite imagery from low-resolution inputs. This process preserves critical spatial characteristics including textural patterns, boundary discontinuities, and high-frequency spectral components essential for detailed remote sensing analysis. The comparative analysis of MWT-Diff across multiple datasets demonstrated favorable performance compared to recent approaches, as measured by standard perceptual quality metrics including FID and LPIPS.


A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake

arXiv.org Artificial Intelligence

Building damage identification shortly after a disaster is crucial for guiding emergency response and recovery efforts. Although optical satellite imagery is commonly used for disaster mapping, its effectiveness is often hampered by cloud cover or the absence of pre-event acquisitions. To overcome these challenges, we introduce a novel multimodal deep learning (DL) framework for detecting building damage using single-date very high resolution (VHR) Synthetic Aperture Radar (SAR) imagery from the Italian Space Agency (ASI) COSMO SkyMed (CSK) constellation, complemented by auxiliary geospatial data. Our method integrates SAR image patches, OpenStreetMap (OSM) building footprints, digital surface model (DSM) data, and structural and exposure attributes from the Global Earthquake Model (GEM) to improve detection accuracy and contextual interpretation. Unlike existing approaches that depend on pre and post event imagery, our model utilizes only post event data, facilitating rapid deployment in critical scenarios. The framework effectiveness is demonstrated using a new dataset from the 2023 earthquake in Turkey, covering multiple cities with diverse urban settings. Results highlight that incorporating geospatial features significantly enhances detection performance and generalizability to previously unseen areas. By combining SAR imagery with detailed vulnerability and exposure information, our approach provides reliable and rapid building damage assessments without the dependency from available pre-event data. Moreover, the automated and scalable data generation process ensures the framework's applicability across diverse disaster-affected regions, underscoring its potential to support effective disaster management and recovery efforts. Code and data will be made available upon acceptance of the paper.


SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images

arXiv.org Artificial Intelligence

Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substantial class disparities, the invisibility of key ground objects due to occlusion, and object size variation. Despite the extraordinary potential in deep convolutional neural networks (DCNNs) in image feature learning and representation, extracting sufficient features from FRRS images for accurate semantic segmentation is still challenging. These challenges demand the deep learning models to learn robust features and generate sufficient feature descriptors. Specifically, learning multi-contextual features to guarantee adequate coverage of varied object sizes from the ground scene and harnessing global-local contexts to overcome class disparities challenge even profound networks. Deeper networks significantly lose spatial details due to gradual downsampling processes resulting in poor segmentation results and coarse boundaries. This article presents a stacked deep residual network (SDRNet) for semantic segmentation from FRRS images. The proposed framework utilizes two stacked encoder-decoder networks to harness long-range semantics yet preserve spatial information and dilated residual blocks (DRB) between each encoder and decoder network to capture sufficient global dependencies thus improving segmentation performance. Our experimental results obtained using the ISPRS Vaihingen and Potsdam datasets demonstrate that the SDRNet performs effectively and competitively against current DCNNs in semantic segmentation.


MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification

arXiv.org Artificial Intelligence

Recent advances in deep learning for vision tasks have seen the rise of State Space Models (SSMs) like Mamba, celebrated for their linear scalability. However, their adaptation to 2D visual data often necessitates complex modifications that may diminish efficiency. In this paper, we introduce MambaOutRS, a novel hybrid convolutional architecture for remote sensing image classification that re-evaluates the necessity of recurrent SSMs. MambaOutRS builds upon stacked Gated CNN blocks for local feature extraction and introduces a novel Fourier Filter Gate (FFG) module that operates in the frequency domain to capture global contextual information efficiently. Our architecture employs a four-stage hierarchical design and was extensively evaluated on challenging remote sensing datasets: UC Merced, AID, NWPU-RESISC45, and EuroSAT. MambaOutRS consistently achieved state-of-the-art (SOTA) performance across these benchmarks. Notably, our MambaOutRS-t variant (24.0M parameters) attained the highest F1-scores of 98.41\% on UC Merced and 95.99\% on AID, significantly outperforming existing baselines, including larger transformer models and Mamba-based architectures, despite using considerably fewer parameters. An ablation study conclusively demonstrates the critical role of the Fourier Filter Gate in enhancing the model's ability to capture global spatial patterns, leading to robust and accurate classification. These results strongly suggest that the complexities of recurrent SSMs can be effectively superseded by a judicious combination of gated convolutions for spatial mixing and frequency-based gates for spectral global context. Thus, MambaOutRS provides a compelling and efficient paradigm for developing high-performance deep learning models in remote sensing and other vision domains, particularly where computational efficiency is paramount.


A Global-Local Cross-Attention Network for Ultra-high Resolution Remote Sensing Image Semantic Segmentation

arXiv.org Artificial Intelligence

With the rapid development of ultra-high resolution (UHR) remote sensing technology, the demand for accurate and efficient semantic segmentation has increased significantly. However, existing methods face challenges in computational efficiency and multi-scale feature fusion. To address these issues, we propose GLCANet (Global-Local Cross-Attention Network), a lightweight segmentation framework designed for UHR remote sensing imagery.GLCANet employs a dual-stream architecture to efficiently fuse global semantics and local details while minimizing GPU usage. A self-attention mechanism enhances long-range dependencies, refines global features, and preserves local details for better semantic consistency. A masked cross-attention mechanism also adaptively fuses global-local features, selectively enhancing fine-grained details while exploiting global context to improve segmentation accuracy. Experimental results show that GLCANet outperforms state-of-the-art methods regarding accuracy and computational efficiency. The model effectively processes large, high-resolution images with a small memory footprint, providing a promising solution for real-world remote sensing applications.


DiffRIS: Enhancing Referring Remote Sensing Image Segmentation with Pre-trained Text-to-Image Diffusion Models

arXiv.org Artificial Intelligence

Referring remote sensing image segmentation (RRSIS) enables the precise delineation of regions within remote sensing imagery through natural language descriptions, serving critical applications in disaster response, urban development, and environmental monitoring. Despite recent advances, current approaches face significant challenges in processing aerial imagery due to complex object characteristics including scale variations, diverse orientations, and semantic ambiguities inherent to the overhead perspective. To address these limitations, we propose Di ffRIS, a novel framework that harnesses the semantic understanding capabilities of pre-trained text-to-image di ff usion models for enhanced cross-modal alignment in RRSIS tasks. Our framework introduces two key innovations: a context perception adapter (CP-adapter) that dynamically refines linguistic features through global context modeling and object-aware reasoning, and a progressive cross-modal reasoning decoder (PCMRD) that iteratively aligns textual descriptions with visual regions for precise segmentation. The CP-adapter bridges the domain gap between general vision-language understanding and remote sensing applications, while PCMRD enables fine-grained semantic alignment through multi-scale feature interaction. Comprehensive experiments on three benchmark datasets--RRSIS-D, RefSegRS, and RISBench--demonstrate that Di ffRIS consistently outperforms existing methods across all standard metrics, establishing a new state-of-the-art for RRSIS tasks. The significant performance improvements validate the e ff ectiveness of leveraging pre-trained di ff usion models for remote sensing applications through our proposed adaptive framework. Introduction Referring remote sensing image segmentation (RRSIS) aims to identify specific regions in remote sensing imagery based on given textual conditions, making it particularly suitable for practical applications such as defense reconnaissance[1], climate impact studies[2], urban infrastructure management[3], and land use categorization[4]. Unlike traditional single-modal segmentation methods[5, 6], RRSIS leverages textual descriptions to guide image segmentation, overcoming the limitations of fixed category labels and enabling the processing of more diverse vocabulary and syntactic variations.


Segment Anything for Satellite Imagery: A Strong Baseline and a Regional Dataset for Automatic Field Delineation

arXiv.org Artificial Intelligence

Accurate mapping of agricultural field boundaries is essential for the efficient operation of agriculture. Automatic extraction from high-resolution satellite imagery, supported by computer vision techniques, can avoid costly ground surveys. In this paper, we present a pipeline for field delineation based on the Segment Anything Model (SAM), introducing a fine-tuning strategy to adapt SAM to this task. In addition to using published datasets, we describe a method for acquiring a complementary regional dataset that covers areas beyond current sources. Extensive experiments assess segmentation accuracy and evaluate the generalization capabilities. Our approach provides a robust baseline for automated field delineation. The new regional dataset, known as ERAS, is now publicly available.