MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs

Groenendijk, Rick, Dorst, Leo, Gevers, Theo

arXiv.org Artificial Intelligence 

Contemporary deep learning architectures exploit pooling operations for two reasons: to filter impactful activation values from feature maps, and to reduce spatial feature size [28]. The most used pooling operation is the max pool, which is used in nearly all common network architectures such as ResNet [14], VGGNet [32], and DenseNet [16]. These network architectures can be applied to pixel-level prediction tasks, such as semantic segmentation. To do so, inputs are down-sampled to a set of latent features of small spatial size, after which they are up-sampled to full resolution again. Up-sampling from pooled feature sets most often happens with a combination of unpooling and deconvolution [41, 42] and is used in seminal works such as [3, 22, 26]. As will be shown in this paper, down-sampling using max pooling can be formalised and improved using mathematical morphology, the mathematics of contact. Ever since the works of Serra [29], the underlying algebraic structure of data that is acquired using probing contact (e.g. LiDAR and radar) has been known to the computer vision community [5, 11, 25, 33]. It is different from the algebra of linear diffusion that is used to build convolutional neural networks (CNNs).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found