Understanding neural networks with reproducing kernel Banach spaces

Bartolucci, Francesca, De Vito, Ernesto, Rosasco, Lorenzo, Vigogna, Stefano

arXiv.org Machine Learning 

In this paper we discuss how the theory of reproducing kernel Banach spaces can be used to tackle this challenge. In particular, we prove a representer theorem for a wide class of reproducing kernel Banach spaces that admit a suitable integral representation and include one hidden layer neural networks of possibly infinite width. Further, we show that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure, with norm given by the total variation norm of the measure. Our analysis simplifies and extends recent results in [34, 29, 30]. Neural networks provide a flexible and effective class of machine learning models, by recursively composing linear and nonlinear functions. The models thus obtained correspond to nonlinearly parameterized functions, and typically require non convex optimization procedures [14]. While this does not prevent good empirical performances, it makes understanding neural network properties considerably complex. Indeed, characterizing what function classes can be well represented/approximated by neural networks is a clear question, albeit far from being answered [31, 2, 34, 29, 30, 15]. Moreover, networks with large numbers of parameters are often practically successful, seemingly contradicting the idea that models should be simple to be learned from data [48, 6]. This observation raises the question of in what sense the complexity of the models is explicitly or implicitly controlled.