On the VC dimension of deep group convolutional neural networks

Sepliarskaia, Anna, Langer, Sophie, Schmidt-Hieber, Johannes

arXiv.org Machine Learning 

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision, achieving remarkable success in tasks such as image classification(Krizhevsky et al., 2012), object detection (Ren et al., 2016), and segmentation (Long et al., 2015). Their effectiveness can be partly attributed to their translation invariant architecture, enabling CNNs to recognize objects regardless of their position in an image. However, while CNNs are effective at capturing translation symmetries, there has been a growing interest in incorporating additional structure into neural networks to handle a wider range of transformations. These architectures aim to combine the flexibility of learning with the robustness of structure-preserving features (see, e.g., (Hinton and Wang, 2011; Lee et al., 2015)). GCNNs were first introduced by Cohen and Welling (2016a) to improve statistical efficiency and enhance geometric reasoning. Since then equivariant network structures have evolved to support equivariance on Euclidean groups(Bekkers et al., 2018; Bekkers, 2019; Weiler et al., 2018), compact groups(Kondor and Trivedi, 2018) and Riemannian manifolds (Weiler et al., 2021). More recent architectures have even been generalized beyondother types ofsymmetrygroups(Zhdanov et al., 2024; Dehmamy et al., 2021; Smets et al., 2023)