Goto

Collaborating Authors

 Support Vector Machines


Regularizing AdaBoost

Neural Information Processing Systems

Boosting methods maximize a hard classification margin and are known as powerful techniques that do not exhibit overfitting for low noise cases. Also for noisy data boosting will try to enforce a hard margin and thereby give too much weight to outliers, which then leads to the dilemma of non-smooth fits and overfitting. Therefore we propose three algorithms to allow for soft margin classification by introducing regularization with slack variables into the boosting concept: (1) AdaBoostreg and regularized versions of (2) linear and (3) quadratic programming AdaBoost. Experiments show the usefulness of the proposed algorithms in comparison to another soft margin classifier: the support vector machine.


Shrinking the Tube: A New Support Vector Regression Algorithm

Neural Information Processing Systems

A new algorithm for Support Vector regression is described. For a priori chosen 1/, it automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction 1/ of the data points lie outside. More(cid:173) over, it is shown how to use parametric tube shapes with non-constant radius. The algorithm is analysed theoretically and experimentally.


The Relevance Vector Machine

Neural Information Processing Systems

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of prob(cid:173) abilistic outputs, the requirement to estimate a trade-off parameter and the need to utilise'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treat(cid:173) ment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation per(cid:173) formance, the RVM requires dramatically fewer kernel functions.


Model Selection for Support Vector Machines

Neural Information Processing Systems

New functionals for parameter (model) selection of Support Vector Ma(cid:173) chines are introduced based on the concepts of the span of support vec(cid:173) tors and rescaling of the feature space. It is shown that using these func(cid:173) tionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.


Probabilistic Methods for Support Vector Machines

Neural Information Processing Systems

I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This can provide intuitive guidelines for choosing a'good' SVM kernel. It can also assign (by evidence maximization) optimal values to parameters such as the noise level C which cannot be determined unambiguously from properties of the MAP solution alone (such as cross-validation er(cid:173) ror) . I illustrate this using a simple approximate expression for the SVM evidence. Once C has been determined, error bars on SVM predictions can also be obtained. Support Vector Machines (SVMs) have recently been the subject of intense re(cid:173) search activity within the neural networks community; for tutorial introductions and overviews of recent developments see [1, 2, 3].


An Improved Decomposition Algorithm for Regression Support Vector Machines

Neural Information Processing Systems

A new decomposition algorithm for training regression Support Vector Machines (SVM) is presented. The algorithm builds on the basic principles of decomposition proposed by Osuna et. The new criteria for testing optimality of a working set are derived. Based on these criteria, the principle of "maximal inconsistency" is pro(cid:173) posed to form (approximately) optimal working sets. Experimental results show superior performance of the new algorithm in compar(cid:173) ison with traditional training of regression SVM without decompo(cid:173) sition.


Support Vector Method for Novelty Detection

Neural Information Processing Systems

Suppose you are given some dataset drawn from an underlying probabil(cid:173) ity distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified l/ between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a poten(cid:173) tially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. We provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.


A Geometric Interpretation of v-SVM Classifiers

Neural Information Processing Systems

We show that the recently proposed variant of the Support Vector machine (SVM) algorithm, known as v-SVM, can be interpreted as a maximal separation between subsets of the convex hulls of the data, which we call soft convex hulls. The soft convex hulls are controlled by choice of the parameter v. If the intersection of the convex hulls is empty, the hyperplane is positioned halfway between them such that the distance between convex hulls, measured along the normal, is maximized; and if it is not, the hyperplane's normal is similarly determined by the soft convex hulls, but its position (perpendicular distance from the origin) is adjusted to minimize the error sum. The proposed geometric interpretation of v-SVM also leads to necessary and sufficient conditions for the existence of a choice of v for which the v-SVM solution is nontrivial.


Understanding Stepwise Generalization of Support Vector Machines: a Toy Model

Neural Information Processing Systems

In this article we study the effects of introducing structure in the input distribution of the data to be learnt by a simple perceptron. We determine the learning curves within the framework of Statis(cid:173) tical Mechanics. Stepwise generalization occurs as a function of the number of examples when the distribution of patterns is highly anisotropic. Although extremely simple, the model seems to cap(cid:173) ture the relevant features of a class of Support Vector Machines which was recently shown to present this behavior.


Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions

Neural Information Processing Systems

Recently, sample complexity bounds have been derived for problems in(cid:173) volving linear functions such as neural networks and support vector ma(cid:173) chines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regular(cid:173) ized linear functions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear clas(cid:173) sifiers with similar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new meth(cid:173) ods from the asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms.