rkb
Deep Networks are Reproducing Kernel Chains
Heeringa, Tjeerd Jan, Spek, Len, Brune, Christoph
Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new framework that composes kernels rather than functions, preserving the desirable properties of RKBS. We prove that any deep neural network function is a neural cRKBS function, and conversely, any neural cRKBS function defined on a finite dataset corresponds to a deep neural network. This approach provides a sparse solution to the empirical risk minimization problem, requiring no more than $N$ neurons per layer, where $N$ is the number of data points.
Mirror Descent on Reproducing Kernel Banach Spaces
Kumar, Akash, Belkin, Mikhail, Pandit, Parthe
Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a kernel.
Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint
The goal of this paper is to investigate the advantages and disadvantages of learning in Banach spaces over Hilbert spaces. While many works have been carried out in generalizing Hilbert methods to Banach spaces, in this paper, we consider the simple problem of learning a Parzen window classifier in a reproducing kernel Banach space (RKBS)--which is closely related to the notion of embedding probability measures into an RKBS--in order to carefully understand its pros and cons over the Hilbert space classifier. We show that while this generalization yields richer distance measures on probabilities compared to its Hilbert space counterpart, it however suffers from serious computational drawback limiting its practical applicability, which therefore demonstrates the need for developing efficient learning algorithms in Banach spaces.
Neural reproducing kernel Banach spaces and representer theorems for deep networks
Bartolucci, Francesca, De Vito, Ernesto, Rosasco, Lorenzo, Vigogna, Stefano
Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications. Our study extends analogous results for shallow networks and can be seen as a step towards considering more practically plausible neural architectures.
Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces
Wang, Rui, Xu, Yuesheng, Yan, Mingsong
Sparsity of a learning solution is a desirable feature in machine learning. Certain reproducing kernel Banach spaces (RKBSs) are appropriate hypothesis spaces for sparse learning methods. The goal of this paper is to understand what kind of RKBSs can promote sparsity for learning solutions. We consider two typical learning models in an RKBS: the minimum norm interpolation (MNI) problem and the regularization problem. We first establish an explicit representer theorem for solutions of these problems, which represents the extreme points of the solution set by a linear combination of the extreme points of the subdifferential set, of the norm function, which is data-dependent. We then propose sufficient conditions on the RKBS that can transform the explicit representation of the solutions to a sparse kernel representation having fewer terms than the number of the observed data. Under the proposed sufficient conditions, we investigate the role of the regularization parameter on sparsity of the regularized solutions. We further show that two specific RKBSs: the sequence space $\ell_1(\mathbb{N})$ and the measure space can have sparse representer theorems for both MNI and regularization models.
Duality for Neural Networks through Reproducing Kernel Banach Spaces
Spek, Len, Heeringa, Tjeerd Jan, Schwenninger, Felix, Brune, Christoph
Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in various areas of machine learning. Recently, Barron spaces have been used to prove bounds on the generalisation error for neural networks. Unfortunately, Barron spaces cannot be understood in terms of RKHS due to the strong nonlinear coupling of the weights. This can be solved by using the more general Reproducing Kernel Banach spaces (RKBS). We show that these Barron spaces belong to a class of integral RKBS. This class can also be understood as an infinite union of RKHS spaces. Furthermore, we show that the dual space of such RKBSs, is again an RKBS where the roles of the data and parameters are interchanged, forming an adjoint pair of RKBSs including a reproducing kernel. This allows us to construct the saddle point problem for neural networks, which can be used in the whole field of primal-dual optimisation.
Understanding neural networks with reproducing kernel Banach spaces
Bartolucci, Francesca, De Vito, Ernesto, Rosasco, Lorenzo, Vigogna, Stefano
In this paper we discuss how the theory of reproducing kernel Banach spaces can be used to tackle this challenge. In particular, we prove a representer theorem for a wide class of reproducing kernel Banach spaces that admit a suitable integral representation and include one hidden layer neural networks of possibly infinite width. Further, we show that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure, with norm given by the total variation norm of the measure. Our analysis simplifies and extends recent results in [34, 29, 30]. Neural networks provide a flexible and effective class of machine learning models, by recursively composing linear and nonlinear functions. The models thus obtained correspond to nonlinearly parameterized functions, and typically require non convex optimization procedures [14]. While this does not prevent good empirical performances, it makes understanding neural network properties considerably complex. Indeed, characterizing what function classes can be well represented/approximated by neural networks is a clear question, albeit far from being answered [31, 2, 34, 29, 30, 15]. Moreover, networks with large numbers of parameters are often practically successful, seemingly contradicting the idea that models should be simple to be learned from data [48, 6]. This observation raises the question of in what sense the complexity of the models is explicitly or implicitly controlled.
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Wright, Matthew A., Gonzalez, Joseph E.
Despite their ubiquity in core AI fields like natural language processing, the mechanics of deep attention-based neural networks like the Transformer model are not fully understood. In this article, we present a new perspective towards understanding how Transformers work. In particular, we show that the "dot-product attention" that is the core of the Transformer's operation can be characterized as a kernel learning method on a pair of Banach spaces. In particular, the Transformer's kernel is characterized as having an infinite feature dimension. Along the way we consider an extension of the standard kernel learning problem to a binary setting, where data come from two input domains and a response is defined for every cross-domain pair. We prove a new representer theorem for these binary kernel machines with non-Mercer (indefinite, asymmetric) kernels (implying that the functions learned are elements of reproducing kernel Banach spaces rather than Hilbert spaces), and also prove a new universal approximation theorem showing that the Transformer calculation can learn any binary non-Mercer reproducing kernel Banach space pair. We experiment with new kernels in Transformers, and obtain results that suggest the infinite dimensionality of the standard Transformer kernel is partially responsible for its performance. This paper's results provide a new theoretical understanding of a very important but poorly understood model in modern machine~learning.
Solving Support Vector Machines in Reproducing Kernel Banach Spaces with Positive Definite Functions
Fasshauer, Gregory E., Hickernell, Fred J., Ye, Qi
In this paper we solve support vector machines in reproducing kernel Banach spaces with reproducing kernels defined on nonsymmetric domains instead of the traditional methods in reproducing kernel Hilbert spaces. Using the orthogonality of semi-inner-products, we can obtain the explicit representations of the dual (normalized-duality-mapping) elements of support vector machine solutions. In addition, we can introduce the reproduction property in a generalized native space by Fourier transform techniques such that it becomes a reproducing kernel Banach space, which can be even embedded into Sobolev spaces, and its reproducing kernel is set up by the related positive definite function. The representations of the optimal solutions of support vector machines (regularized empirical risks) in these reproducing kernel Banach spaces are formulated explicitly in terms of positive definite functions, and their finite numbers of coefficients can be computed by fixed point iteration. We also give some typical examples of reproducing kernel Banach spaces induced by Mat\'ern functions (Sobolev splines) so that their support vector machine solutions are well computable as the classical algorithms. Moreover, each of their reproducing bases includes information from multiple training data points. The concept of reproducing kernel Banach spaces offers us a new numerical tool for solving support vector machines.
Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint
Fukumizu, Kenji, Lanckriet, Gert R., Sriperumbudur, Bharath K.
The goal of this paper is to investigate the advantages and disadvantages of learning in Banach spaces over Hilbert spaces. While many works have been carried out in generalizing Hilbert methods to Banach spaces, in this paper, we consider the simple problem of learning a Parzen window classifier in a reproducing kernel Banach space (RKBS)---which is closely related to the notion of embedding probability measures into an RKBS---in order to carefully understand its pros and cons over the Hilbert space classifier. We show that while this generalization yields richer distance measures on probabilities compared to its Hilbert space counterpart, it however suffers from serious computational drawback limiting its practical applicability, which therefore demonstrates the need for developing efficient learning algorithms in Banach spaces.