On Approximation in Deep Convolutional Networks: a Kernel Perspective

Bietti, Alberto

arXiv.org Machine Learning 

Deep convolutional models have been at the heart of the recent successes of deep learning in problems where the data consists of high-dimensional signals, such as image classification or speech recognition. The convolution and pooling operations in these architectures are known to be crucial for their practical success, yet our theoretical understanding of how they enable efficient learning is still limited. One key difficulty for understanding such models is the curse of dimensionality: due to the highdimensionality of the input data, it is hopeless to learn arbitrary functions from samples. For instance, classical non-parametric regression techniques for approximating Lipschitz or Sobolev functions typically require either low dimension or an order of smoothness of the target function comparable to the dimension in order to obtain good generalization (e.g., Wainwright, 2019), which is a very strong assumption when dealing with high-dimensional signals. Thus, further assumptions on the target function are needed to make the problem more tractable, in a way that makes convolutions a useful modeling tool. Various works have studied approximation benefits with models that resemble deep convolutional architectures, for instance through hierarchical models with local connectivity (Mhaskar and Poggio, 2016; Schmidt-Hieber et al., 2020), or through structured tensor decompositions (Cohen and Shashua, 2017). Nevertheless, while such function classes may provide improved statistical efficiency, it is unclear if they can be learned with computationally efficient algorithms, which makes it difficult to assess the validity of these approximation models empirically. In order to overcome the computational difficulties, we provide a different perspective based on kernel methods (e.g., Schölkopf and Smola, 2001; Wahba, 1990), which are known to be computationally tractable with well-understood statistical and approximation properties. In particular, we consider "deep" structured kernels known as convolutional kernels, which have produced good empirical performance on standard

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found