Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator

Kowalczyk, Adam

Neural Information Processing Systems 

Vapnik's result that the expectation of the generalisation error ofthe optimal hyperplane is bounded by the expectation of the ratio of the number of support vectors to the number of training examples is extended to a broad class of kernel machines. The class includes Support Vector Machines for soft margin classification and regression, and Regularization Networks with a variety of kernels and cost functions. We show that key inequalities in Vapnik's result become equalities once "the classification error" is replaced by "the margin error", with the latter defined as an instance with positive cost. In particular we show that expectations of the true margin error and the empirical margin error are equal, and that the sparse solutions for kernel machines are possible only if the cost function is "partially" insensitive. 1 Introduction Minimization of regularized risk is a backbone of several recent advances in machine learning, including Support Vector Machines (SVM) [13], Regularization Networks (RN) [5] or Gaussian Processes [15]. Such a machine is typically implemented as a weighted sum of a kernel function evaluated for pairs composed of a data vector in question and a number of selected training vectors, so called support vectors.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found