Geophysical Analysis & Survey
Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel
Jiang, Ziyang, Zheng, Tongshu, Liu, Yiling, Carlson, David
It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of deep learning and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). We implement this idea by combining a deep network and an efficient mapping based on the Nystrom approximation, which we call Implicit Composite Kernel (ICK). We then adopt a sample-then-optimize approach to approximate the full GP posterior distribution. We demonstrate that ICK has superior performance and flexibility on both synthetic and real-world data sets. We believe that ICK framework can be used to include prior information into neural networks in many applications.
Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck
Songara, Jayesh, Pande, Shivam, Choudhury, Shabnam, Banerjee, Biplab, Velmurugan, Rajbabu
In this research, we deal with the problem of visual question answering (VQA) in remote sensing. While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy. Furthermore, processing image information jointly with language features adds additional constraints, such as mapping the corresponding image and language features. To handle this problem, we propose a cross attention based approach combined with information maximization. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information maximization learns a low dimensional bottleneck layer, that has all the relevant information required to carry out the VQA task. We evaluate our method on two VQA remote sensing datasets of different resolutions. For the high resolution dataset, we achieve an overall accuracy of 79.11% and 73.87% for the two test sets while for the low resolution dataset, we achieve an overall accuracy of 85.98%.
Multimodal Fusion Transformer for Remote Sensing Image Classification
Roy, Swalpa Kumar, Deria, Ankur, Hong, Danfeng, Rasti, Behnood, Plaza, Antonio, Chanussot, Jocelyn
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary information in addition to the HSI in the transformer encoder to achieve better generalization. The concept of tokenization is used to generate CLS and HSI patch tokens, helping to learn a {distinctive representation} in a reduced and hierarchical feature space. Extensive experiments are carried out on {widely used benchmark} datasets {i.e.,} the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. We compare the results of the proposed MFT model with other state-of-the-art transformers, classical CNNs, and conventional classifiers models. The superior performance achieved by the proposed model is due to the use of multihead cross patch attention. The source code will be made available publicly at \url{https://github.com/AnkurDeria/MFT}.}
IRX-1D: A Simple Deep Learning Architecture for Remote Sensing Classifications
Pal, Mahesh, Akshay, null, Teja, B. Charan
We proposes a simple deep learning architecture combining elements of Inception, ResNet and Xception networks. Four new datasets were used for classification with both small and large training samples. Results in terms of classification accuracy suggests improved performance by proposed architecture in comparison to Bayesian optimised 2D-CNN with small training samples. Comparison of results using small training sample with Indiana Pines hyperspectral dataset suggests comparable or better performance by proposed architecture than nine reported works using different deep learning architectures. In spite of achieving high classification accuracy with limited training samples, comparison of classified image suggests different land cover classes are assigned to same area when compared with the classified image provided by the model trained using large training samples with all datasets.
Transforming Observations of Ocean Temperature with a Deep Convolutional Residual Regressive Neural Network
Larson, Albert, Akanda, Ali Shafqat
Sea surface temperature (SST) is an essential climate variable that can be measured via ground truth, remote sensing, or hybrid model methodologies. Here, we celebrate SST surveillance progress via the application of a few relevant technological advances from the late 20th and early 21st century. We further develop our existing water cycle observation framework, Flux to Flow (F2F), to fuse AMSR-E and MODIS into a higher resolution product with the goal of capturing gradients and filling cloud gaps that are otherwise unavailable. Our neural network architecture is constrained to a deep convolutional residual regressive neural network. We utilize three snapshots of twelve monthly SST measurements in 2010 as measured by the passive microwave radiometer AMSR-E, the visible and infrared monitoring MODIS instrument, and the in situ Argo dataset ISAS. The performance of the platform and success of this approach is evaluated using the root mean squared error (RMSE) metric. We determine that the 1:1 configuration of input and output data and a large observation region is too challenging for the single compute node and dcrrnn structure as is. When constrained to a single 100 x 100 pixel region and a small training dataset, the algorithm improves from the baseline experiment covering a much larger geography. For next discrete steps, we envision the consideration of a large input range with a very small output range. Furthermore, we see the need to integrate land and sea variables before performing computer vision tasks like those within. Finally, we see parallelization as necessary to overcome the compute obstacles we encountered.
CD-CTFM: A Lightweight CNN-Transformer Network for Remote Sensing Cloud Detection Fusing Multiscale Features
Ge, Wenxuan, Yang, Xubing, Zhang, Li
Clouds in remote sensing images inevitably affect information extraction, which hinder the following analysis of satellite images. Hence, cloud detection is a necessary preprocessing procedure. However, the existing methods have numerous calculations and parameters. In this letter, a lightweight CNN-Transformer network, CD-CTFM, is proposed to solve the problem. CD-CTFM is based on encoder-decoder architecture and incorporates the attention mechanism. In the decoder part, we utilize a lightweight network combing CNN and Transformer as backbone, which is conducive to extract local and global features simultaneously. Moreover, a lightweight feature pyramid module is designed to fuse multiscale features with contextual information. In the decoder part, we integrate a lightweight channel-spatial attention module into each skip connection between encoder and decoder, extracting low-level features while suppressing irrelevant information without introducing many parameters. Finally, the proposed model is evaluated on two cloud datasets, 38-Cloud and MODIS. The results demonstrate that CD-CTFM achieves comparable accuracy as the state-of-art methods. At the same time, CD-CTFM outperforms state-of-art methods in terms of efficiency.
DeepLCZChange: A Remote Sensing Deep Learning Model Architecture for Urban Climate Resilience
Sun, Wenlu, Sun, Yao, Liu, Chenying, Albrecht, Conrad M
Urban land use structures impact local climate conditions of metropolitan areas. To shed light on the mechanism of local climate wrt. urban land use, we present a novel, data-driven deep learning architecture and pipeline, DeepLCZChange, to correlate airborne LiDAR data statistics with the Landsat 8 satellite's surface temperature product. A proof-of-concept numerical experiment utilizes corresponding remote sensing data for the city of New York to verify the cooling effect of urban forests.
Recent applications of machine learning, remote sensing, and iot approaches in yield prediction: a critical review
Bassine, Fatima Zahra, Epule, Terence Epule, Kechchour, Ayoub, Chehbouni, Abdelghani
The integration of remote sensing and machine learning in agriculture is transforming the industry by providing insights and predictions through data analysis. This combination leads to improved yield prediction and water management, resulting in increased efficiency, better yields, and more sustainable agricultural practices. Achieving the United Nations' Sustainable Development Goals, especially "zero hunger," requires the investigation of crop yield and precipitation gaps, which can be accomplished through, the usage of artificial intelligence (AI), machine learning (ML), remote sensing (RS), and the internet of things (IoT). By integrating these technologies, a robust agricultural mobile or web application can be developed, providing farmers and decision-makers with valuable information and tools for improving crop management and increasing efficiency. Several studies have investigated these new technologies and their potential for diverse tasks such as crop monitoring, yield prediction, irrigation management, etc. Through a critical review, this paper reviews relevant articles that have used RS, ML, cloud computing, and IoT in crop yield prediction. It reviews the current state-of-the-art in this field by critically evaluating different machine-learning approaches proposed in the literature for crop yield prediction and water management. It provides insights into how these methods can improve decision-making in agricultural production systems. This work will serve as a compendium for those interested in yield prediction in terms of primary literature but, most importantly, what approaches can be used for real-time and robust prediction.
DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling
Perez, Gustavo, Maji, Subhransu, Sheldon, Daniel
Many modern applications use computer vision to detect and count objects in massive image collections. However, when the detection task is very difficult or in the presence of domain shifts, the counts may be inaccurate even with significant investments in training data and model development. We propose DISCount -- a detector-based importance sampling framework for counting in large image collections that integrates an imperfect detector with human-in-the-loop screening to produce unbiased estimates of counts. We propose techniques for solving counting problems over multiple spatial or temporal regions using a small number of screened samples and estimate confidence intervals. This enables end-users to stop screening when estimates are sufficiently accurate, which is often the goal in a scientific study. On the technical side we develop variance reduction techniques based on control variates and prove the (conditional) unbiasedness of the estimators. DISCount leads to a 9-12x reduction in the labeling costs over naive screening for tasks we consider, such as counting birds in radar imagery or estimating damaged buildings in satellite imagery, and also surpasses alternative covariate-based screening approaches in efficiency.
The Canadian Cropland Dataset: A New Land Cover Dataset for Multitemporal Deep Learning Classification in Agriculture
Jacques, Amanda A. Boatswain, Diallo, Abdoulaye Baniré, Lord, Etienne
Monitoring land cover using remote sensing is vital for studying environmental changes and ensuring global food security through crop yield forecasting. Specifically, multitemporal remote sensing imagery provides relevant information about the dynamics of a scene, which has proven to lead to better land cover classification results. Nevertheless, few studies have benefited such high spatial and temporal resolution data due to the difficulty of accessing reliable, fine-grained and high-quality annotated samples to support their hypotheses. Therefore, we introduce a temporal patch-based dataset of Canadian croplands, enriched with labels retrieved from the Canadian Annual Crop Inventory. The dataset contains 78,536 manually verified and curated high-resolution (10 m/pixel, 640 x 640 m) geo-referenced images from 10 crop classes collected over four crop production years (2017-2020) and five months (June-October). Each instance contains 12 spectral bands, an RGB image, and additional vegetation index bands. Individually, each category contains at least 4,800 images. Moreover, as a benchmark, we provide models and source code that allow a user to predict the crop class using a single image (ResNet, DenseNet, EfficientNet) or a sequence of images (LRCN, 3D-CNN) from the same location. In perspective, we expect this evolving dataset to propel the creation of robust agro-environmental models that can accelerate the comprehension of complex agricultural regions by providing accurate and continuous monitoring of land cover.