MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs
Groenendijk, Rick, Dorst, Leo, Gevers, Theo
–arXiv.org Artificial Intelligence
Contemporary deep learning architectures exploit pooling operations for two reasons: to filter impactful activation values from feature maps, and to reduce spatial feature size [28]. The most used pooling operation is the max pool, which is used in nearly all common network architectures such as ResNet [14], VGGNet [32], and DenseNet [16]. These network architectures can be applied to pixel-level prediction tasks, such as semantic segmentation. To do so, inputs are down-sampled to a set of latent features of small spatial size, after which they are up-sampled to full resolution again. Up-sampling from pooled feature sets most often happens with a combination of unpooling and deconvolution [41, 42] and is used in seminal works such as [3, 22, 26]. As will be shown in this paper, down-sampling using max pooling can be formalised and improved using mathematical morphology, the mathematics of contact. Ever since the works of Serra [29], the underlying algebraic structure of data that is acquired using probing contact (e.g. LiDAR and radar) has been known to the computer vision community [5, 11, 25, 33]. It is different from the algebra of linear diffusion that is used to build convolutional neural networks (CNNs).
arXiv.org Artificial Intelligence
Nov-25-2022