We consider a general regularised interpolation problem for learning a parameter vector from data. The well known representer theorem says that under certain conditions on the regulariser there exists a solution in the linear span of the data points. This is the core of kernel methods in machine learning as it makes the problem computationally tractable. Necessary and sufficient conditions for differentiable regularisers on Hilbert spaces to admit a representer theorem have been proved. We extend those results to nondifferentiable regularisers on uniformly convex and uniformly smooth Banach spaces. This gives a (more) complete answer to the question when there is a representer theorem. We then note that for regularised interpolation in fact the solution is determined by the function space alone and independent of the regulariser, making the extension to Banach spaces even more valuable.
We discuss construction of coverings of the unit ball of a finite dimensional Banach space. The well known technique of comparing volumes gives upper and lower bounds on covering numbers. This technique does not provide a construction of good coverings. Here we apply incoherent dictionaries for construction of good coverings. We use the following strategy. First, we build a good covering by balls with a radius close to one. Second, we iterate this construction to obtain a good covering for any radius. We mostly concentrate on the first step of this strategy.
In this paper we solve support vector machines in reproducing kernel Banach spaces with reproducing kernels defined on nonsymmetric domains instead of the traditional methods in reproducing kernel Hilbert spaces. Using the orthogonality of semi-inner-products, we can obtain the explicit representations of the dual (normalized-duality-mapping) elements of support vector machine solutions. In addition, we can introduce the reproduction property in a generalized native space by Fourier transform techniques such that it becomes a reproducing kernel Banach space, which can be even embedded into Sobolev spaces, and its reproducing kernel is set up by the related positive definite function. The representations of the optimal solutions of support vector machines (regularized empirical risks) in these reproducing kernel Banach spaces are formulated explicitly in terms of positive definite functions, and their finite numbers of coefficients can be computed by fixed point iteration. We also give some typical examples of reproducing kernel Banach spaces induced by Mat\'ern functions (Sobolev splines) so that their support vector machine solutions are well computable as the classical algorithms. Moreover, each of their reproducing bases includes information from multiple training data points. The concept of reproducing kernel Banach spaces offers us a new numerical tool for solving support vector machines.
Recently, there has been emerging interest in constructing reproducing kernel Banach spaces (RKBS) for applied and theoretical purposes such as machine learning, sampling reconstruction, sparse approximation and functional analysis. Existing constructions include the reflexive RKBS via a bilinear form, the semi-inner-product RKBS, the RKBS with $\ell^1$ norm, the $p$-norm RKBS via generalized Mercer kernels, etc. The definitions of RKBS and the associated reproducing kernel in those references are dependent on the construction. Moreover, relations among those constructions are unclear. We explore a generic definition of RKBS and the reproducing kernel for RKBS that is independent of construction. Furthermore, we propose a framework of constructing RKBSs that unifies existing constructions mentioned above via a continuous bilinear form and a pair of feature maps. A new class of Orlicz RKBSs is proposed. Finally, we develop representer theorems for machine learning in RKBSs constructed in our framework, which also unifies representer theorems in existing RKBSs.
The problem of convex optimization is studied. Usually in convex optimization the minimization is over a d-dimensional domain. Very often the convergence rate of an optimization algorithm depends on the dimension d. The algorithms studied in this paper utilize dictionaries instead of a canonical basis used in the coordinate descent algorithms. We show how this approach allows us to reduce dimensionality of the problem. Also, we investigate which properties of a dictionary are beneficial for the convergence rate of typical greedy-type algorithms.