Support Vector Machines
Kernel Logistic Regression and the Import Vector Machine
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only per- forms as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM.
Classifying Single Trial EEG: Towards Brain Computer Interfacing
Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger movements in a natural keyboard typing condition and predicts their lat- erality. This can be done on average 100โ230 ms before the respective key is actually pressed, i.e., long before the onset of EMG. Our approach is appealing for its short response time and high classification accuracy ( 96%) in a binary decision where no human training is involved. We compare discriminative classifiers like Support Vector Machines (SVMs) and different variants of Fisher Discriminant that possess favorable reg- ularization properties for dealing with high noise cases (inter-trial vari- ablity).
Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that func(cid:173) tion approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers.
Escaping the Convex Hull with Extrapolated Vector Machines
Maximum margin classifiers such as Support Vector Machines (SVMs) critically depends upon the convex hulls of the training samples of each class, as they implicitly search for the minimum distance between the convex hulls. We propose Extrapolated Vec(cid:173) tor Machines (XVMs) which rely on extrapolations outside these convex hulls. XVMs improve SVM generalization very significantly on the MNIST [7] OCR data. They share similarities with the Fisher discriminant: maximize the inter-class margin while mini(cid:173) mizing the intra-class disparity.
Incorporating Invariances in Non-Linear Support Vector Machines
The choice of an SVM kernel corresponds to the choice of a rep(cid:173) resentation of the data in a feature space and, to improve per(cid:173) formance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in non(cid:173) linear kernels. We show on a digit recognition task that the pro(cid:173) posed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.
A Prototype for Automatic Recognition of Spontaneous Facial Actions
Spontaneous facial expressions differ substan- tially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects delib- erately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an ap- proach based on 3-D warping of images into canonical views. We eval- uated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models.
Adaptive Scaling for Feature Selection in SVMs
This paper introduces an algorithm for the automatic relevance determi- nation of input variables in kernelized Support Vector Machines. Rele- vance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection proce- dures and demonstrates its effectiveness on a demanding facial expres- sion recognition problem.
Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
We introduce a family of classiflers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classiflers, includes the two best-known support-vector machines (SVMs), the "{SVM and the C{SVM. In the electrostat- ics analogy, a training example corresponds to a charged conductor at a given location in space, the classiflcation function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algo- rithms and their interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-deflnite kernels, regularization techniques that allow for the construction of an optimal classifler in Minkowski space. Based on the framework, we propose novel SVMs and per- form simulation studies to show that they are comparable or su- perior to standard SVMs.
Rational Kernels
We introduce a general family of kernels based on weighted transduc- ers or rational relations, rational kernels, that can be used for analysis of variable-length sequences or more generally weighted automata, in appli- cations such as computational biology or speech recognition. We show that rational kernels can be computed efficiently using a general algo- rithm of composition of weighted transducers and a general single-source shortest-distance algorithm. We also describe several general families of positive definite symmetric rational kernels. These general kernels can be combined with Support Vector Machines to form efficient and power- ful techniques for spoken-dialog classification: highly complex kernels become easy to design and implement and lead to substantial improve- ments in the classification accuracy. We also show that the string kernels considered in applications to computational biology are all specific in- stances of rational kernels.
Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines
We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotoni- cally to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables in parallel with a guarantee of im- provement at each iteration.