Support Vector Machines
Learning a Distance Metric from Relative Comparisons
Schultz, Matthew, Joachims, Thorsten
This paper presents a method for learning a distance metric from relative comparison such as "A is closer to B than A is to C". Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents.
Invariant Pattern Recognition by Semi-Definite Programming Machines
Graepel, Thore, Herbrich, Ralf
Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm-- the Semidefinite Programming Machine (SDPM)--which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one transformation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods.
Learning a Distance Metric from Relative Comparisons
Schultz, Matthew, Joachims, Thorsten
This paper presents a method for learning a distance metric from relative comparisonsuch as "A is closer to B than A is to C". Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods forSVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents.
Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
Bartlett, Peter L., Jordan, Michael I., Mcauliffe, Jon D.
Many classification algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by providing ageneral quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds under theweakest possible condition on the loss function--that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a finite-dimensional base class.
Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
Littlewort, G.C., Bartlett, M.S., Fasel, I.R., Chenu, J., Kanda, T., Ishiguro, H., Movellan, J.R.
Computer animated agents and robots bring a social dimension to human computerinteraction and force us to think in new ways about how computers could be used in daily life. Face to face communication is a real-time process operating at a time scale of less than a second. In this paper we present progress on a perceptual primitive to automatically detect frontal faces in the video stream and code them with respect to 7 dimensions in real time: neutral, anger, disgust, fear, joy, sadness, surprise. Theface finder employs a cascade of feature detectors trained with boosting techniques [13, 2]. The expression recognizer employs a novel combination of Adaboost and SVM's. The generalization performance to new subjects for a 7-way forced choice was 93.3% and 97% correct on two publicly available datasets.
Sparse Greedy Minimax Probability Machine Classification
Strohmann, Thomas R., Belitski, Andrei, Grudic, Gregory Z., DeCoste, Dennis
The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracybound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally addingbasis functions (i.e.
New Algorithms for Efficient High Dimensional Non-parametric Classification
liu, Ting, Moore, Andrew W., Gray, Alexander
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, andwe investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phaseof SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.
Approximate Analytical Bootstrap Averages for Support Vector Classifiers
Malzahn, Dörthe, Opper, Manfred
We compute approximate analytical bootstrap averages for support vector classificationusing a combination of the replica method of statistical physics and the TAP approach for approximate inference. We test our method on a few datasets and compare it with exact averages obtained by extensive Monte-Carlo sampling.
Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds
The decision functions constructed by support vector machines (SVM's) usually depend only on a subset of the training set--the so-called support vectors. We derive asymptotically sharp lower and upper bounds on the number of support vectors for several standard types of SVM's. In particular, weshow for the Gaussian RBF kernel that the fraction of support vectors tends to twice the Bayes risk for the L1-SVM, to the probability of noise for the L2-SVM, and to 1 for the LS-SVM.