Online Kernel Selection: Algorithms and Evaluations

AAAI Conferences

Kernel methods have been successfully applied to many machine learning problems. Nevertheless, since the performance of kernel methods depends heavily on the type of kernels being used, identifying good kernels among a set of given kernels is important to the success of kernel methods. A straightforward approach to address this problem is cross-validation by training a separate classifier for each kernel and choosing the best kernel classifier out of them. Another approach is Multiple Kernel Learning (MKL), which aims to learn a single kernel classifier from an optimal combination of multiple kernels. However, both approaches suffer from a high computational cost in computing the full kernel matrices and in training, especially when the number of kernels or the number of training examples is very large. In this paper, we tackle this problem by proposing an efficient online kernel selection algorithm. It incrementally learns a weight for each kernel classifier. The weight for each kernel classifier can help us to select a good kernel among a set of given kernels. The proposed approach is efficient in that (i) it is an online approach and therefore avoids computing all the full kernel matrices before training; (ii) it only updates a single kernel classifier each time by a sampling technique and therefore saves time on updating kernel classifiers with poor performance; (iii) it has a theoretically guaranteed performance compared to the best kernel predictor. Empirical studies on image classification tasks demonstrate the effectiveness of the proposed approach for selecting a good kernel among a set of kernels.


Influence Function and Robust Variant of Kernel Canonical Correlation Analysis

arXiv.org Machine Learning

Many unsupervised kernel methods rely on the estimation of the kernel covariance operator (kernel CO) or kernel cross-covariance operator (kernel CCO). Both kernel CO and kernel CCO are sensitive to contaminated data, even when bounded positive definite kernels are used. To the best of our knowledge, there are few well-founded robust kernel methods for statistical unsupervised learning. In addition, while the influence function (IF) of an estimator can characterize its robustness, asymptotic properties and standard error, the IF of a standard kernel canonical correlation analysis (standard kernel CCA) has not been derived yet. To fill this gap, we first propose a robust kernel covariance operator (robust kernel CO) and a robust kernel cross-covariance operator (robust kernel CCO) based on a generalized loss function instead of the quadratic loss function. Second, we derive the IF for robust kernel CCO and standard kernel CCA. Using the IF of the standard kernel CCA, we can detect influential observations from two sets of data. Finally, we propose a method based on the robust kernel CO and the robust kernel CCO, called {\bf robust kernel CCA}, which is less sensitive to noise than the standard kernel CCA. The introduced principles can also be applied to many other kernel methods involving kernel CO or kernel CCO. Our experiments on synthesized data and imaging genetics analysis demonstrate that the proposed IF of standard kernel CCA can identify outliers. It is also seen that the proposed robust kernel CCA method performs better for ideal and contaminated data than the standard kernel CCA.


Robust Kernel (Cross-) Covariance Operators in Reproducing Kernel Hilbert Space toward Kernel Methods

arXiv.org Machine Learning

To the best of our knowledge, there are no general well-founded robust methods for statistical unsupervised learning. Most of the unsupervised methods explicitly or implicitly depend on the kernel covariance operator (kernel CO) or kernel cross-covariance operator (kernel CCO). They are sensitive to contaminated data, even when using bounded positive definite kernels. First, we propose robust kernel covariance operator (robust kernel CO) and robust kernel crosscovariance operator (robust kernel CCO) based on a generalized loss function instead of the quadratic loss function. Second, we propose influence function of classical kernel canonical correlation analysis (classical kernel CCA). Third, using this influence function, we propose a visualization method to detect influential observations from two sets of data. Finally, we propose a method based on robust kernel CO and robust kernel CCO, called robust kernel CCA, which is designed for contaminated data and less sensitive to noise than classical kernel CCA. The principles we describe also apply to many kernel methods which must deal with the issue of kernel CO or kernel CCO. Experiments on synthesized and imaging genetics analysis demonstrate that the proposed visualization and robust kernel CCA can be applied effectively to both ideal data and contaminated data. The robust methods show the superior performance over the state-of-the-art methods.


On Valid Optimal Assignment Kernels and Applications to Graph Classification

Neural Information Processing Systems

The success of kernel methods has initiated the design of novel positive semidefinite functions, in particular for structured data. A leading design paradigm for this is the convolution kernel, which decomposes structured objects into their parts and sums over all pairs of parts. Assignment kernels, in contrast, are obtained from an optimal bijection between parts, which can provide a more valid notion of similarity. In general however, optimal assignments yield indefinite functions, which complicates their use in kernel methods. We characterize a class of base kernels used to compare parts that guarantees positive semidefinite optimal assignment kernels. These base kernels give rise to hierarchies from which the optimal assignment kernels are computed in linear time by histogram intersection. We apply these results by developing the Weisfeiler-Lehman optimal assignment kernel for graphs. It provides high classification accuracy on widely-used benchmark data sets improving over the original Weisfeiler-Lehman kernel.