Geophysical Analysis & Survey
Centroid-UNet: Detecting Centroids in Aerial Images
Deshapriya, N. Lakmal, Tran, Dan, Reddy, Sriram, Gunasekara, Kavinda
In many applications of aerial/satellite image analysis (remote sensing), the generation of exact shapes of objects is a cumbersome task. In most remote sensing applications such as counting objects requires only location estimation of objects. Hence, locating object centroids in aerial/satellite images is an easy solution for tasks where the object's exact shape is not necessary. Thus, this study focuses on assessing the feasibility of using deep neural networks for locating object centroids in satellite images. Name of our model is Centroid-UNet. The Centroid-UNet model is based on classic U-Net semantic segmentation architecture. We modified and adapted the U-Net semantic segmentation architecture into a centroid detection model preserving the simplicity of the original model. Furthermore, we have tested and evaluated our model with two case studies involving aerial/satellite images. Those two case studies are building centroid detection case study and coconut tree centroid detection case study. Our evaluation results have reached comparably good accuracy compared to other methods, and also offer simplicity. The code and models developed under this study are also available in the Centroid-UNet GitHub repository: https://github.com/gicait/centroid-unet
Image-to-image Translation as a Unique Source of Knowledge
Image-to-image (I2I) translation is an established way of translating data from one domain to another but the usability of the translated images in the target domain when working with such dissimilar domains as the SAR/optical satellite imagery ones and how much of the origin domain is translated to the target domain is still not clear enough. This article address this by performing translations of labelled datasets from the optical domain to the SAR domain with different I2I algorithms from the state-of-the-art, learning from transferred features in the destination domain and evaluating later how much from the original dataset was transferred. Added to this, stacking is proposed as a way of combining the knowledge learned from the different I2I translations and evaluated against single models.
RSBNet: One-Shot Neural Architecture Search for A Backbone Network in Remote Sensing Image Recognition
Peng, Cheng, Li, Yangyang, Shang, Ronghua, Jiao, Licheng
Recently, a massive number of deep learning based approaches have been successfully applied to various remote sensing image (RSI) recognition tasks. However, most existing advances of deep learning methods in the RSI field heavily rely on the features extracted by the manually designed backbone network, which severely hinders the potential of deep learning models due the complexity of RSI and the limitation of prior knowledge. In this paper, we research a new design paradigm for the backbone architecture in RSI recognition tasks, including scene classification, land-cover classification and object detection. A novel one-shot architecture search framework based on weight-sharing strategy and evolutionary algorithm is proposed, called RSBNet, which consists of three stages: Firstly, a supernet constructed in a layer-wise search space is pretrained on a self-assembled large-scale RSI dataset based on an ensemble single-path training strategy. Next, the pre-trained supernet is equipped with different recognition heads through the switchable recognition module and respectively fine-tuned on the target dataset to obtain task-specific supernet. Finally, we search the optimal backbone architecture for different recognition tasks based on the evolutionary algorithm without any network training. Extensive experiments have been conducted on five benchmark datasets for different recognition tasks, the results show the effectiveness of the proposed search paradigm and demonstrate that the searched backbone is able to flexibly adapt different RSI recognition tasks and achieve impressive performance.
Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery
Object detection is a canonical task in computer vision, as well as in remote sensing. Object detection in remote sensing imagery deals with detecting instances of visual objects of certain classes, most of which are man-made, buildings, airplanes, ships, vehicles, to name a few. This technology has been widely used in many civilian and military fields, such as port and airport flow monitoring, traffic diversion, urban planning, lost ship search and rescue. Traditional machine learning (ML) schemes based on the encoding of handcrafted features (e.g., textures, color histogram, or more complex HOG Dalal and Triggs (2005), SIFT Lowe (2004), Haar Viola and Jones (2001),ACF Dollár, Appel, Belongie and Perona (2014), etc.) can only generate shallow to middle features with limited representativity. Recently, with the rapid development of deep learning (DL), convolutional neural networks (CNNs) have became a new and powerful approach for feature extraction and greatly improved the performance of object detection. Current CNN-based object detection methods could be roughly divided into two streams: two-stage schemes and one-stage schemes. The two-stage detector, such as R-CNN Girshick, Donahue, Darrell and Malik (2014), Fast R-CNN Girshick (2015), Faster R-CNN Ren, He, Girshick and Sun (2017) and other detectors Cai and Vasconcelos (2018); Pang, Chen, Shi, Feng, Ouyang and Lin (2019); Li, Chen, Wang and Zhang (2019b), divide the detection into localization and recognition stages, having one more region-proposal step than single-stage detectors.
Joint Characterization of the Cryospheric Spectral Feature Space
Small, Christopher, Sousa, Daniel
Hyperspectral feature spaces are useful for many remote sensing applications ranging from spectral mixture modeling to discrete thematic classification. In such cases, characterization of the feature space dimensionality, geometry and topology can provide guidance for effective model design. The objective of this study is to compare and contrast two approaches for identifying feature space basis vectors via dimensionality reduction. These approaches can be combined to render a joint characterization that reveals spectral properties not apparent using either approach alone. We use a diverse collection of AVIRIS-NG reflectance spectra of the snow-firn-ice continuum to illustrate the utility of joint characterization and identify physical properties inferred from the spectra. Spectral feature spaces combining principal components (PCs) and t-distributed Stochastic Neighbor Embeddings (t-SNEs) provide physically interpretable dimensions representing the global (PC) structure of cryospheric reflectance properties and local (t-SNE) manifold structures revealing clustering not resolved in the global continuum. Joint characterization reveals distinct continua for snow-firn gradients on different parts of the Greenland Ice Sheet and multiple clusters of ice reflectance properties common to both glacier and sea ice in different locations. Clustering revealed in t-SNE feature spaces, and extended to the joint characterization, distinguishes differences in spectral curvature specific to location within the snow accumulation zone, and BRDF effects related to view geometry. The ability of PC+t-SNE joint characterization to produce a physically interpretable spectral feature spaces revealing global topology while preserving local manifold structures suggests that this characterization might be extended to the much higher dimensional hyperspectral feature space of all terrestrial land cover.
Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
Akiva, Peri, Purri, Matthew, Leotta, Matthew
Self-supervised learning aims to learn image feature representations without the usage of manually annotated labels. It is often used as a precursor step to obtain useful initial network weights which contribute to faster convergence and superior performance of downstream tasks. While self-supervision allows one to reduce the domain gap between supervised and unsupervised learning without the usage of labels, the self-supervised objective still requires a strong inductive bias to downstream tasks for effective transfer learning. In this work, we present our material and texture based self-supervision method named MATTER (MATerial and TExture Representation Learning), which is inspired by classical material and texture methods. Material and texture can effectively describe any surface, including its tactile properties, color, and specularity. By extension, effective representation of material and texture can describe other semantic classes strongly associated with said material and texture. MATTER leverages multi-temporal, spatially aligned remote sensing imagery over unchanged regions to learn invariance to illumination and viewing angle as a mechanism to achieve consistency of material and texture representation. We show that our self-supervision pre-training method allows for up to 24.22% and 6.33% performance increase in unsupervised and fine-tuned setups, and up to 76% faster convergence on change detection, land cover classification, and semantic segmentation tasks.
Panoptic Segmentation Meets Remote Sensing
de Carvalho, Osmar Luiz Ferreira, Júnior, Osmar Abílio de Carvalho, Silva, Cristiano Rosa e, de Albuquerque, Anesmar Olino, Santana, Nickolas Castro, Borges, Dibio Leandro, Gomes, Roberto Arnaldo Trancoso, Guimarães, Renato Fontes
Panoptic segmentation combines instance and semantic predictions, allowing the detection of "things" and "stuff" simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most algorithms are designed for traditional images, (b) image labelling must encompass "things" and "stuff" classes, and (c) the annotation format is complex. Thus, aiming to solve and increase the operability of panoptic segmentation in remote sensing, this study has five objectives: (1) create a novel data preparation pipeline for panoptic segmentation, (2) propose an annotation conversion software to generate panoptic annotations; (3) propose a novel dataset on urban areas, (4) modify the Detectron2 for the task, and (5) evaluate difficulties of this task in the urban setting. We used an aerial image with a 0,24-meter spatial resolution considering 14 classes. Our pipeline considers three image inputs, and the proposed software uses point shapefiles for creating samples in the COCO format. Our study generated 3,400 samples with 512x512 pixel dimensions. We used the Panoptic-FPN with two backbones (ResNet-50 and ResNet-101), and the model evaluation considered semantic instance and panoptic metrics. We obtained 93.9, 47.7, and 64.9 for the mean IoU, box AP, and PQ. Our study presents the first effective pipeline for panoptic segmentation and an extensive database for other researchers to use and deal with other data or related problems requiring a thorough scene understanding.
Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data
Ma, Jun, Shen, Huanfeng, Wu, Penghai, Wu, Jingan, Gao, Meiling, Meng, Chunlei
Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial completeness and a high spatio-temporal resolution. Land surface models (LSMs) can be used to simulate gapless LST with a high temporal resolution, but this usually comes with a low spatial resolution. In this paper, we present an integrated temperature fusion framework for satellite-observed and LSM-simulated LST data to map gapless LST at a 60-m spatial resolution and half-hourly temporal resolution. The global linear model (GloLM) model and the diurnal land surface temperature cycle (DTC) model are respectively performed as preprocessing steps for sensor and temporal normalization between the different LST data. The Landsat LST, Moderate Resolution Imaging Spectroradiometer (MODIS) LST, and Community Land Model Version 5.0 (CLM 5.0)-simulated LST are then fused using a filter-based spatio-temporal integrated fusion model. Evaluations were implemented in an urban-dominated region (the city of Wuhan in China) and a natural-dominated region (the Heihe River Basin in China), in terms of accuracy, spatial variability, and diurnal temporal dynamics. Results indicate that the fused LST is highly consistent with actual Landsat LST data (in situ LST measurements), in terms of a Pearson correlation coefficient of 0.94 (0.97-0.99), a mean absolute error of 0.71-0.98 K (0.82-3.17 K), and a root-mean-square error of 0.97-1.26 K (1.09-3.97 K).
Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset
de Carvalho, Osmar Luiz Ferreira, Júnior, Osmar Abílio de Carvalho, de Albuquerque, Anesmar Olino, Santana, Nickolas Castro, Borges, Dibio Leandro, Gomes, Roberto Arnaldo Trancoso, Guimarães, Renato Fontes
Vehicle classification is a hot computer vision topic, with studies ranging from ground-view up to top-view imagery. In remote sensing, the usage of top-view images allows for understanding city patterns, vehicle concentration, traffic management, and others. However, there are some difficulties when aiming for pixel-wise classification: (a) most vehicle classification studies use object detection methods, and most publicly available datasets are designed for this task, (b) creating instance segmentation datasets is laborious, and (c) traditional instance segmentation methods underperform on this task since the objects are small. Thus, the present research objectives are: (1) propose a novel semi-supervised iterative learning approach using GIS software, (2) propose a box-free instance segmentation approach, and (3) provide a city-scale vehicle dataset. The iterative learning procedure considered: (1) label a small number of vehicles, (2) train on those samples, (3) use the model to classify the entire image, (4) convert the image prediction into a polygon shapefile, (5) correct some areas with errors and include them in the training data, and (6) repeat until results are satisfactory. To separate instances, we considered vehicle interior and vehicle borders, and the DL model was the U-net with the Efficient-net-B7 backbone. When removing the borders, the vehicle interior becomes isolated, allowing for unique object identification. To recover the deleted 1-pixel borders, we proposed a simple method to expand each prediction. The results show better pixel-wise metrics when compared to the Mask-RCNN (82% against 67% in IoU). On per-object analysis, the overall accuracy, precision, and recall were greater than 90%. This pipeline applies to any remote sensing target, being very efficient for segmentation and generating datasets.
TorchGeo: deep learning with geospatial data
Remotely sensed geospatial data are critical for applications including precision agriculture, urban planning, disaster monitoring and response, and climate change research, among others. Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available. However, the variance in data collection methods and handling of geospatial metadata make the application of deep learning methodology to remotely sensed data nontrivial. For example, satellite imagery often includes additional spectral bands beyond red, green, and blue and must be joined to other geospatial data sources that can have differing coordinate systems, bounds, and resolutions. To help realize the potential of deep learning for remote sensing applications, we introduce TorchGeo, a Python library for integrating geospatial data into the PyTorch deep learning ecosystem. TorchGeo provides data loaders for a variety of benchmark datasets, composable datasets for generic geospatial data sources, samplers for geospatial data, and transforms that work with multispectral imagery. TorchGeo is also the first library to provide pre-trained models for multispectral satellite imagery (e.g. models that use all bands from the Sentinel 2 satellites), allowing for advances in transfer learning on downstream remote sensing tasks with limited labeled data. We use TorchGeo to create reproducible benchmark results on existing datasets and benchmark our proposed method for preprocessing geospatial imagery on-the-fly. TorchGeo is open-source and available on GitHub: https://github.com/microsoft/torchgeo.