Goto

Collaborating Authors

 Geophysical Analysis & Survey


Semantic Residual Prompts for Continual Learning

arXiv.org Artificial Intelligence

Prompt-tuning methods for Continual Learning (CL) freeze a large pre-trained model and focus training on a few parameter vectors termed prompts. Most of these methods organize these vectors in a pool of key-value pairs, and use the input image as query to retrieve the prompts (values). However, as keys are learned while tasks progress, the prompting selection strategy is itself subject to catastrophic forgetting, an issue often overlooked by existing approaches. For instance, prompts introduced to accommodate new tasks might end up interfering with previously learned prompts. To make the selection strategy more stable, we ask a foundational model (CLIP) to select our prompt within a two-level adaptation mechanism. Specifically, the first level leverages standard textual prompts for the CLIP textual encoder, leading to stable class prototypes. The second level, instead, uses these prototypes along with the query image as keys to index a second pool. The retrieved prompts serve to adapt a pre-trained ViT, granting plasticity. In doing so, we also propose a novel residual mechanism to transfer CLIP semantics to the ViT layers. Through extensive analysis on established CL benchmarks, we show that our method significantly outperforms both state-of-the-art CL approaches and the zero-shot CLIP test. Notably, our findings hold true even for datasets with a substantial domain gap w.r.t. the pre-training knowledge of the backbone model, as showcased by experiments on satellite imagery and medical datasets.


Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery

arXiv.org Artificial Intelligence

Rare object detection is a fundamental task in applied geospatial machine learning, however is often challenging due to large amounts of high-resolution satellite or aerial imagery and few or no labeled positive samples to start with. This paper addresses the problem of bootstrapping such a rare object detection task assuming there is no labeled data and no spatial prior over the area of interest. We propose novel offline and online cluster-based approaches for sampling patches that are significantly more efficient, in terms of exposing positive samples to a human annotator, than random sampling. We apply our methods for identifying bomas, or small enclosures for herd animals, in the Serengeti Mara region of Kenya and Tanzania. We demonstrate a significant enhancement in detection efficiency, achieving a positive sampling rate increase from 2% (random) to 30%. This advancement enables effective machine learning mapping even with minimal labeling budgets, exemplified by an F1 score on the boma detection task of 0.51 with a budget of 300 total patches.


Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey

arXiv.org Artificial Intelligence

Semantic segmentation is an important and popular research area in computer vision that focuses on classifying pixels in an image based on their semantics. However, supervised deep learning requires large amounts of data to train models and the process of labeling images pixel by pixel is time-consuming and laborious. This review aims to provide a first comprehensive and organized overview of the state-of-the-art research results on pseudo-label methods in the field of semi-supervised semantic segmentation, which we categorize from different perspectives and present specific methods for specific application areas. In addition, we explore the application of pseudo-label technology in medical and remote-sensing image segmentation. Finally, we also propose some feasible future research directions to address the existing challenges.


BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery

arXiv.org Artificial Intelligence

Satellites equipped with optical sensors capture high-resolution imagery, providing valuable insights into various environmental phenomena. In recent years, there has been a surge of research focused on addressing some challenges in remote sensing, ranging from water detection in diverse landscapes to the segmentation of mountainous and terrains. Ongoing investigations goals to enhance the precision and efficiency of satellite imagery analysis. Especially, there is a growing emphasis on developing methodologies for accurate water body detection, snow and clouds, important for environmental monitoring, resource management, and disaster response. Accurate remote sensing data analysis can be challenging due to the presence of clouds in optical sensor-based applications. The quality of resulting products such as applications and research is directly impacted by cloud detection, which plays a key role in the remote sensing data processing pipeline. This paper examines seven cutting-edge semantic segmentation and detection algorithms applied to clouds identification, conducting a benchmark analysis to evaluate their architectural approaches and identify the most performing ones. To increase the model's adaptability, critical elements including the type of imagery and the amount of spectral bands used during training are analyzed. Additionally, this research tries to produce machine learning algorithms that can perform cloud segmentation using only a few spectral bands, including RGB and RGBN-IR combinations. The model's flexibility for a variety of applications and user scenarios is assessed by using imagery from Sentinel-2 and Landsat-8 as datasets. The current study involves a thorough benchmark analysis, evaluating modern deep learning models for cloud detection in remote sensing imagery. The principal objective encompasses the provision of a meticulous and relative evaluation of these models, offering elucidations regarding their proficiencies, deficiencies, and potential deployment utility.


A Synergistic Approach to Wildfire Prevention and Management Using AI, ML, and 5G Technology in the United States

arXiv.org Artificial Intelligence

Over the past few years, wildfires have become a worldwide environmental emergency, resulting in substantial harm to natural habitats and playing a part in the acceleration of climate change. Wildfire management methods involve prevention, response, and recovery efforts. Despite improvements in detection techniques, the rising occurrence of wildfires demands creative solutions for prompt identification and effective control. This research investigates proactive methods for detecting and handling wildfires in the United States, utilizing Artificial Intelligence (AI), Machine Learning (ML), and 5G technology. The specific objective of this research covers proactive detection and prevention of wildfires using advanced technology; Active monitoring and mapping with remote sensing and signaling leveraging on 5G technology; and Advanced response mechanisms to wildfire using drones and IOT devices. This study was based on secondary data collected from government databases and analyzed using descriptive statistics. In addition, past publications were reviewed through content analysis, and narrative synthesis was used to present the observations from various studies. The results showed that developing new technology presents an opportunity to detect and manage wildfires proactively. Utilizing advanced technology could save lives and prevent significant economic losses caused by wildfires. Various methods, such as AI-enabled remote sensing and 5G-based active monitoring, can enhance proactive wildfire detection and management. In addition, super intelligent drones and IOT devices can be used for safer responses to wildfires. This forms the core of the recommendation to the fire Management Agencies and the government.


Intelligent Known and Novel Aircraft Recognition -- A Shift from Classification to Similarity Learning for Combat Identification

arXiv.org Artificial Intelligence

Precise aircraft recognition in low-resolution remote sensing imagery is a challenging yet crucial task in aviation, especially combat identification. This research addresses this problem with a novel, scalable, and AI-driven solution. The primary hurdle in combat identification in remote sensing imagery is the accurate recognition of Novel/Unknown types of aircraft in addition to Known types. Traditional methods, human expert-driven combat identification and image classification, fall short in identifying Novel classes. Our methodology employs similarity learning to discern features of a broad spectrum of military and civilian aircraft. It discerns both Known and Novel aircraft types, leveraging metric learning for the identification and supervised few-shot learning for aircraft type classification. To counter the challenge of limited low-resolution remote sensing data, we propose an end-to-end framework that adapts to the diverse and versatile process of military aircraft recognition by training a generalized embedder in fully supervised manner. Comparative analysis with earlier aircraft image classification methods shows that our approach is effective for aircraft image classification (F1-score Aircraft Type of 0.861) and pioneering for quantifying the identification of Novel types (F1-score Bipartitioning of 0.936). The proposed methodology effectively addresses inherent challenges in remote sensing data, thereby setting new standards in dataset quality. The research opens new avenues for domain experts and demonstrates unique capabilities in distinguishing various aircraft types, contributing to a more robust, domain-adapted potential for real-time aircraft recognition.


Automated Floodwater Depth Estimation Using Large Multimodal Model for Rapid Flood Mapping

arXiv.org Artificial Intelligence

Information on the depth of floodwater is crucial for rapid mapping of areas affected by floods. However, previous approaches for estimating floodwater depth, including field surveys, remote sensing, and machine learning techniques, can be time-consuming and resource-intensive. This paper presents an automated and fast approach for estimating floodwater depth from on-site flood photos. A pre-trained large multimodal model, GPT-4 Vision, was used specifically for estimating floodwater. The input data were flooding photos that contained referenced objects, such as street signs, cars, people, and buildings. Using the heights of the common objects as references, the model returned the floodwater depth as the output. Results show that the proposed approach can rapidly provide a consistent and reliable estimation of floodwater depth from flood photos. Such rapid estimation is transformative in flood inundation mapping and assessing the severity of the flood in near-real time, which is essential for effective flood response strategies.


HSONet:A Siamese foreground association-driven hard case sample optimization network for high-resolution remote sensing image change detection

arXiv.org Artificial Intelligence

In the later training stages, further improvement of the models ability to determine changes relies on how well the change detection (CD) model learns hard cases; however, there are two additional challenges to learning hard case samples: (1) change labels are limited and tend to pointer only to foreground targets, yet hard case samples are prevalent in the background, which leads to optimizing the loss function focusing on the foreground targets and ignoring the background hard cases, which we call imbalance. (2) Complex situations, such as light shadows, target occlusion, and seasonal changes, induce hard case samples, and in the absence of both supervisory and scene information, it is difficult for the model to learn hard case samples directly to accurately obtain the feature representations of the change information, which we call missingness. We propose a Siamese foreground association-driven hard case sample optimization network (HSONet). To deal with this imbalance, we propose an equilibrium optimization loss function to regulate the optimization focus of the foreground and background, determine the hard case samples through the distribution of the loss values, and introduce dynamic weights in the loss term to gradually shift the optimization focus of the loss from the foreground to the background hard cases as the training progresses. To address this missingness, we understand hard case samples with the help of the scene context, propose the scene-foreground association module, use potential remote sensing spatial scene information to model the association between the target of interest in the foreground and the related context to obtain scene embedding, and apply this information to the feature reinforcement of hard cases. Experiments on four public datasets show that HSONet outperforms current state-of-the-art CD methods, particularly in detecting hard case samples.


Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

arXiv.org Artificial Intelligence

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.


SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

Neural Information Processing Systems

Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to " 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to " 14%) and semantic segmentation.