Geophysical Analysis & Survey
HyperspectralViTs: General Hyperspectral Models for On-board Remote Sensing
On-board processing of hyperspectral data with machine learning models would enable unprecedented amount of autonomy for a wide range of tasks, for example methane detection or mineral identification. This can enable early warning system and could allow new capabilities such as automated scheduling across constellations of satellites. Classical methods suffer from high false positive rates and previous deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support end-to-end training with data of high spectral dimension without relying on hand-crafted products or spectral band compression preprocessing. We evaluate our models on two tasks related to hyperspectral data processing. With our proposed general architectures, we improve the F1 score of the previous methane detection state-of-the-art models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. We also demonstrate that training models on the synthetic dataset improves performance of models finetuned on the dataset of real events by 6.9% in F1 score in contrast with training from scratch. On a newly created dataset for mineral identification, our models provide 3.5% improvement in the F1 score in contrast to the default versions of the models. With our proposed models we improve the inference speed by 85% in contrast to previous classical and deep learning approaches by removing the dependency on classically computed features. With our architecture, one capture from the EMIT sensor can be processed within 30 seconds on realistic proxy of the ION-SCV 004 satellite.
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Ghaboura, Sara, Heakl, Ahmed, Thawakar, Omkar, Alharthi, Ali, Riahi, Ines, Saif, Abduljalil, Laaksonen, Jorma, Khan, Fahad S., Khan, Salman, Anwer, Rao M.
Recent years have witnessed a significant interest in developing large multimodal models (LMMs) capable of performing various visual reasoning and understanding tasks. This has led to the introduction of multiple LMM benchmarks to evaluate LMMs on different tasks. However, most existing LMM evaluation benchmarks are predominantly English-centric. In this work, we develop a comprehensive LMM evaluation benchmark for the Arabic language to represent a large population of over 400 million speakers. The proposed benchmark, named CAMEL-Bench, comprises eight diverse domains and 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding to evaluate broad scenario generalizability. Our CAMEL-Bench comprises around 29,036 questions that are filtered from a larger pool of samples, where the quality is manually verified by native speakers to ensure reliable model assessment. We conduct evaluations of both closed-source, including GPT-4 series, and open-source LMMs. Our analysis reveals the need for substantial improvement, especially among the best open-source models, with even the closed-source GPT-4o achieving an overall score of 62%. Our benchmark and evaluation scripts are open-sourced.
Precision Soil Quality Analysis Using Transformer-based Data Fusion Strategies: A Systematic Review
Saki, Mahdi, Keshavarz, Rasool, Franklin, Daniel, Abolhasan, Mehran, Lipman, Justin, Shariati, Negin
The transformer-based data fusion techniques in agricultural implementation of PA, also known as smart farming, relies remote sensing (RS), with a particular focus on soil on the ability to collect, process, and analyse spatial and analysis. Utilizing a systematic, data-driven approach, we temporal data to optimize field management practices demonstrate that transformers have significantly (Cisternas et al., 2020; Pyingkodi et al., 2022). Despite its outperformed conventional deep learning and machine enormous potential, the adoption of PA remains below learning methods since 2022, achieving prediction expectations due to factors such as high initial investment performance between 92% and 97%. The review is costs, the complexity of IT, and the need for specialized specifically focused on soil analysis, due to the importance knowledge (Cisternas et al., 2020). of soil condition in optimizing crop productivity and Remote sensing (RS) has seen rapid advancements and ensuring sustainable farming practices. Transformer-based widespread adoption in PA, offering high-resolution data models have shown remarkable capabilities in handling for applications ranging from crop monitoring to irrigation complex multivariate soil data, improving the accuracy of management (Sishodia et al., 2020). Remote sensing has soil moisture prediction, soil element analysis, and other proven to be an effective tool for capturing and monitoring soil-related applications. This systematic review primarily the spectral and temporal properties of the land surface focuses on 1) analysing research trends and patterns in the influenced by human activities at different spatial and literature, both chronologically and technically, and 2) temporal scales (Bégué et al., 2018).
Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+
Mahara, Arpan, Khan, Md Rezaul Karim, Rishe, Naphtali D., Wang, Wenjia, Sadjadi, Seyed Masoud
Road Extraction is a sub-domain of Remote Sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field is needed. The DeepLab series, known for its proficiency in semantic segmentation due to its efficiency in interpreting multi-scale objects' features, addresses some of these challenges caused by the varying nature of roads. The present work proposes the utilization of DeepLabV3+, the latest version of the DeepLab series, by introducing an innovative Dense Depthwise Dilated Separable Spatial Pyramid Pooling (DenseDDSSPP) module and integrating it in place of the conventional Atrous Spatial Pyramid Pooling (ASPP) module. This modification enhances the extraction of complex road structures from satellite images. This study hypothesizes that the integration of DenseDDSSPP, combined with an appropriately selected backbone network and a Squeeze-and-Excitation block, will generate an efficient dense feature map by focusing on relevant features, leading to more precise and accurate road extraction from Remote Sensing images. The results section presents a comparison of our model's performance against state-of-the-art models, demonstrating better results that highlight the effectiveness and success of the proposed approach.
SAda-Net: A Self-Supervised Adaptive Stereo Estimation CNN For Remote Sensing Image Data
Hirner, Dominik, Fraundorfer, Friedrich
Stereo estimation has made many advancements in recent years with the introduction of deep-learning. However the traditional supervised approach to deep-learning requires the creation of accurate and plentiful ground-truth data, which is expensive to create and not available in many situations. This is especially true for remote sensing applications, where there is an excess of available data without proper ground truth. To tackle this problem, we propose a self-supervised CNN with self-improving adaptive abilities. In the first iteration, the created disparity map is inaccurate and noisy. Leveraging the left-right consistency check, we get a sparse but more accurate disparity map which is used as an initial pseudo ground-truth. This pseudo ground-truth is then adapted and updated after every epoch in the training step of the network. We use the sum of inconsistent points in order to track the network convergence.
Groningen: Spatial Prediction of Rock Gas Saturation by Leveraging Selected and Augmented Well and Seismic Data with Classifier Ensembles
One of the key aspects of successful field exploration and monitoring of reservoir development is the spatial prediction of hydrocarbon saturation of geological structures. Traditional prediction methods based on various types of elastic inversion of seismic data may be limited in conditions of a complex geological structure and insufficient coverage of the studied space with well data. In such situations, machine learning algorithms can become an effective tool for the nonlinear, multidimensional generalization of knowledge obtained by geophysical methods in the well space to the entire territory covered by 3D seismic surveys. The study proposes a new approach to knowledge transfer, which consists in predicting the probability of gas saturation of the territory using ensembles of classifiers trained on data from logging studies of hydrocarbon saturation along the well trajectory. Attributes of the seismic field are used as predictors.
A Surface Adaptive First-Look Inspection Planner for Autonomous Remote Sensing of Open-Pit Mines
Viswanathan, Vignesh Kottayam, Sumathy, Vidya, Kanellakis, Christoforos, Nikolakopoulos, George
In this work, we present an autonomous inspection framework for remote sensing tasks in active open-pit mines. Specifically, the contributions are focused towards developing a methodology where an initial approximate operator-defined inspection plan is exploited by an online view-planner to predict an inspection path that can adapt to changes in the current mine-face morphology caused by route mining activities. The proposed inspection framework leverages instantaneous 3D LiDAR and localization measurements coupled with modelled sensor footprint for view-planning satisfying desired viewing and photogrammetric conditions. The efficacy of the proposed framework has been demonstrated through simulation in Feiring-Bruk open-pit mine environment and hardware-based outdoor experimental trials. The video showcasing the performance of the proposed work can be found here: https://youtu.be/uWWbDfoBvFc
LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections
Xiang, Xuezhi, Ning, Yibo, Zhang, Lei, Ombati, Denis, Himu, Himaloy, Zhen, Xiantong
Semantic segmentation of remote sensing images is a fundamental task in geospatial research. However, widely used Convolutional Neural Networks (CNNs) and Transformers have notable drawbacks: CNNs may be limited by insufficient remote sensing modeling capability, while Transformers face challenges due to computational complexity. In this paper, we propose a remote-sensing image semantic segmentation network named LKASeg, which combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC). Specifically, we propose a decoder based on Large Kernel Attention (LKA), which extract global features while avoiding the computational overhead of self-attention and providing channel adaptability. To achieve full-scale feature learning and fusion, we apply Full-Scale Skip Connections (FSC) between the encoder and decoder. We conducted experiments by combining the LKA-based decoder with FSC. On the ISPRS Vaihingen dataset, the mF1 and mIoU scores achieved 90.33% and 82.77%.
LCD-Net: A Lightweight Remote Sensing Change Detection Network Combining Feature Fusion and Gating Mechanism
Liu, Wenyu, Li, Jindong, Wang, Haoji, Tan, Run, Fu, Yali, Tian, Qichuan
Remote sensing image change detection (RSCD) is crucial for monitoring dynamic surface changes, with applications ranging from environmental monitoring to disaster assessment. While traditional CNN-based methods have improved detection accuracy, they often suffer from high computational complexity and large parameter counts, limiting their use in resource-constrained environments. To address these challenges, we propose a Lightweight remote sensing Change Detection Network (LCD-Net in short) that reduces model size and computational cost while maintaining high detection performance. LCD-Net employs MobileNetV2 as the encoder to efficiently extract features from bitemporal images. A Temporal Interaction and Fusion Module (TIF) enhances the interaction between bitemporal features, improving temporal context awareness. Additionally, the Feature Fusion Module (FFM) aggregates multiscale features to better capture subtle changes while suppressing background noise. The Gated Mechanism Module (GMM) in the decoder further enhances feature learning by dynamically adjusting channel weights, emphasizing key change regions. Experiments on LEVIR-CD+, SYSU, and S2Looking datasets show that LCD-Net achieves competitive performance with just 2.56M parameters and 4.45G FLOPs, making it well-suited for real-time applications in resource-limited settings. The code is available at https://github.com/WenyuLiu6/LCD-Net.
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery
Cui, Kangning, Tang, Wei, Zhu, Rongkun, Wang, Manqi, Larsen, Gregory D., Pauca, Victor P., Alqahtani, Sarra, Yang, Fan, Segurado, David, Fine, Paul, Karubian, Jordan, Chan, Raymond H., Plemmons, Robert J., Morel, Jean-Michel, Silman, Miles R.
Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Choc\'o forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.