On Approximation in Deep Convolutional Networks: a Kernel Perspective

Feb-19-2021–arXiv.org Machine Learning

Deep convolutional models have been at the heart of the recent successes of deep learning in problems where the data consists of high-dimensional signals, such as image classification or speech recognition. The convolution and pooling operations in these architectures are known to be crucial for their practical success, yet our theoretical understanding of how they enable efficient learning is still limited. One key difficulty for understanding such models is the curse of dimensionality: due to the highdimensionality of the input data, it is hopeless to learn arbitrary functions from samples. For instance, classical non-parametric regression techniques for approximating Lipschitz or Sobolev functions typically require either low dimension or an order of smoothness of the target function comparable to the dimension in order to obtain good generalization (e.g., Wainwright, 2019), which is a very strong assumption when dealing with high-dimensional signals. Thus, further assumptions on the target function are needed to make the problem more tractable, in a way that makes convolutions a useful modeling tool. Various works have studied approximation benefits with models that resemble deep convolutional architectures, for instance through hierarchical models with local connectivity (Mhaskar and Poggio, 2016; Schmidt-Hieber et al., 2020), or through structured tensor decompositions (Cohen and Shashua, 2017). Nevertheless, while such function classes may provide improved statistical efficiency, it is unclear if they can be learned with computationally efficient algorithms, which makes it difficult to assess the validity of these approximation models empirically. In order to overcome the computational difficulties, we provide a different perspective based on kernel methods (e.g., Schölkopf and Smola, 2001; Wahba, 1990), which are known to be computationally tractable with well-understood statistical and approximation properties. In particular, we consider "deep" structured kernels known as convolutional kernels, which have produced good empirical performance on standard

architecture, convolutional kernel, kernel, (15 more...)

arXiv.org Machine Learning

Feb-19-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found