Goto

Collaborating Authors

 fang qingyun


Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

arXiv.org Artificial Intelligence

Object detection is a canonical task in computer vision, as well as in remote sensing. Object detection in remote sensing imagery deals with detecting instances of visual objects of certain classes, most of which are man-made, buildings, airplanes, ships, vehicles, to name a few. This technology has been widely used in many civilian and military fields, such as port and airport flow monitoring, traffic diversion, urban planning, lost ship search and rescue. Traditional machine learning (ML) schemes based on the encoding of handcrafted features (e.g., textures, color histogram, or more complex HOG Dalal and Triggs (2005), SIFT Lowe (2004), Haar Viola and Jones (2001),ACF Dollár, Appel, Belongie and Perona (2014), etc.) can only generate shallow to middle features with limited representativity. Recently, with the rapid development of deep learning (DL), convolutional neural networks (CNNs) have became a new and powerful approach for feature extraction and greatly improved the performance of object detection. Current CNN-based object detection methods could be roughly divided into two streams: two-stage schemes and one-stage schemes. The two-stage detector, such as R-CNN Girshick, Donahue, Darrell and Malik (2014), Fast R-CNN Girshick (2015), Faster R-CNN Ren, He, Girshick and Sun (2017) and other detectors Cai and Vasconcelos (2018); Pang, Chen, Shi, Feng, Ouyang and Lin (2019); Li, Chen, Wang and Zhang (2019b), divide the detection into localization and recognition stages, having one more region-proposal step than single-stage detectors.