Goto

Collaborating Authors

 Support Vector Machines


Support Vector Regression Machines

Neural Information Processing Systems

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.


Improving the Accuracy and Speed of Support Vector Machines

Neural Information Processing Systems

Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inver(cid:173) sion for ill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing in(cid:173) terest. In this paper we combine two such techniques on a pattern recognition problem. The method for improving generalization per(cid:173) formance (the "virtual support vector" method) does so by incor(cid:173) porating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.


From Regularization Operators to Support Vector Kernels

Neural Information Processing Systems

We derive the correspondence between regularization operators used in Regularization Networks and Hilbert Schmidt Kernels appearing in Sup(cid:173) port Vector Machines. More specifica1ly, we prove that the Green's Func(cid:173) tions associated with regularization operators are suitable Support Vector Kernels with equivalent regularization properties. As a by-product we show that a large number of Radial Basis Functions namely condition(cid:173) ally positive definite functions may be used as Support Vector kernels.


Classification by Pairwise Coupling

Neural Information Processing Systems

We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then cou(cid:173) pling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the na(cid:173) ture of the class probability estimates that arise, and examine the performance of the procedure in simulated datasets. The classifiers used include linear discriminants and nearest neighbors: applica(cid:173) tion to support vector machines is also briefly described.


Generalization in Decision Trees and DNF: Does Size Matter?

Neural Information Processing Systems

Recent theoretical results for pattern classification with thresh(cid:173) olded real-valued functions (such as support vector machines, sig(cid:173) moid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions). For example, we show that with high probabil(cid:173) ity any decision tree of depth no more than d that is consistent with m training examples has misclassification probability no more than o ( ( (Neff VCdim(U) log2 m log d)) 1/2), where U is the class of node decision functions, and Neff::; N can be thought of as the effective number of leaves (it becomes small as the distribution on the leaves induced by the training data gets far from uniform). This bound is qualitatively different from the VC bound and can be considerably smaller. We use the same technique to give similar results for DNF formulae.


Exploiting Generative Models in Discriminative Classifiers

Neural Information Processing Systems

Generative probability models such as hidden larkov models pro(cid:173) vide a principled way of treating missing information and dealing with variable length sequences. On the other hand, discriminative methods such as support vector machines enable us to construct flexible decision boundaries and often result in classification per(cid:173) formance superior to that of the model based approaches. An ideal classifier should combine these two complementary approaches. In this paper, we develop a natural way of achieving this combina(cid:173) tion by deriving kernel functions for use in discriminative methods such as support vector machines from generative probability mod(cid:173) els. We provide a theoretical justification for this combination as well as demonstrate a substantial improvement in the classification performance in the context of D A and protein sequence analysis.


Semi-Supervised Support Vector Machines

Neural Information Processing Systems

We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine us(cid:173) ing both the training and working sets. We use S3 YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data.


Support Vector Machines Applied to Face Recognition

Neural Information Processing Systems

Face recognition is a K class problem. The face recognition problem is formulated as a problem in difference space. In difference space we formulate face recognition as a two class problem. The classes are: dissimilarities between faces of the same person. By modifying the interpretation of the decision surface generated by SVM.


Dynamically Adapting Kernels in Support Vector Machines

Neural Information Processing Systems

The kernel-parameter is one of the few tunable parameters in Sup(cid:173) port Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algo(cid:173) rithm which can automatically perform model selection with little additional computational cost and with no need of a validation set . In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.


Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Neural Information Processing Systems

Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) problem. This paper proposes an al(cid:173) gorithm for training SVMs: Sequential Minimal Optimization, or SMO. SMO breaks the large QP problem into a series of smallest possible QP problems which are analytically solvable. Thus, SMO does not require a numerical QP library. SMO's computation time is dominated by eval(cid:173) uation of the kernel, hence kernel optimizations substantially quicken SMO.