Smola, Alex J., Schölkopf, Bernhard

Support Vector (SV) Machines for pattern recognition, regression estimation and operator inversion exploit the idea of transforming into a high dimensional feature space where they perform a linear algorithm. Instead of evaluating this map explicitly, one uses Hilbert Schmidt Kernels k(x, y) which correspond to dot products of the mapped data in high dimensional space, i.e. k(x, y) ( I (x) · I (y))

Smola, Alex J., Schölkopf, Bernhard

Pahikkala, Tapio, Viljanen, Markus, Airola, Antti, Waegeman, Willem

We consider the problem of learning regression functions from pairwise data when there exists prior knowledge that the relation to be learned is symmetric or anti-symmetric. Such prior knowledge is commonly enforced by symmetrizing or anti-symmetrizing pairwise kernel functions. Through spectral analysis, we show that these transformations reduce the kernel's effective dimension. Further, we provide an analysis of the approximation properties of the resulting kernels, and bound the regularization bias of the kernels in terms of the corresponding bias of the original kernel.

Kloft, Marius, Rückert, Ulrich, Bartlett, Peter L.

Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.

This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions $\{k(x,\cdot),x\in\mathcal{X}\}$ associated with a kernel $k$ defined on a space $\mathcal{X}$. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.