Goto

Collaborating Authors

 Support Vector Machines


A Topographic Support Vector Machine: Classification Using Local Label Configurations

Neural Information Processing Systems

The standard approach to the classification of objects is to consider the examples as independent and identically distributed (iid). In many real world settings, however, this assumption is not valid, because a topographical relationshipexists between the objects. In this contribution we consider the special case of image segmentation, where the objects are pixels and where the underlying topography is a 2D regular rectangular grid. We introduce a classification method which not only uses measured vectorial feature information but also the label configuration within a topographic neighborhood.Due to the resulting dependence between the labels of neighboring pixels, a collective classification of a set of pixels becomes necessary. We propose a new method called'Topographic Support VectorMachine' (TSVM), which is based on a topographic kernel and a self-consistent solution to the label assignment shown to be equivalent toa recurrent neural network. The performance of the algorithm is compared to a conventional SVM on a cell image segmentation task.


Parallel Support Vector Machines: The Cascade SVM

Neural Information Processing Systems

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.


Breaking SVM Complexity with Cross-Training

Neural Information Processing Systems

We propose to selectively remove examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). This heuristic procedure aims at creating a separable distribution of training examples with minimal impact on the position of the decision boundary. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.


Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification

Neural Information Processing Systems

We consider the problem of deriving class-size independent generalization boundsfor some regularized discriminative multi-category classification methods.In particular, we obtain an expected generalization bound for a standard formulation of multi-category support vector machines. Basedon the theoretical result, we argue that the formulation over-penalizes misclassification error, which in theory may lead to poor generalization performance. A remedy, based on a generalization of multi-category logistic regression (conditional maximum entropy), is then proposed, and its theoretical properties are examined.


Machine Learning Applied to Perception: Decision Images for Gender Classification

Neural Information Processing Systems

We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines(SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data.Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise thedecision-image corresponding to the normal vector of the separating hyperplanes(SH) of each classifier. We predict that the female-tomaleness transitionalong the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experimentusing the decision images as stimuli is consistent with this prediction.


Fast Rates to Bayes for Kernel Machines

Neural Information Processing Systems

We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al..


Density Level Detection is Classification

Neural Information Processing Systems

We show that anomaly detection can be interpreted as a binary classification problem.Using this interpretation we propose a support vector machine (SVM) for anomaly detection. We then present some theoretical resultswhich include consistency and learning rates. Finally, we experimentally compare our SVM with the standard one-class SVM.


A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities

Neural Information Processing Systems

We devise and experiment with a dynamical kernel-based system for tracking hand movements from neural activity. The state of the system corresponds to the hand location, velocity, and acceleration, while the system's input are the instantaneous spike rates. The system's state dynamics isdefined as a combination of a linear mapping from the previous estimated state and a kernel-based mapping tailored for modeling neural activities. In contrast to generative models, the activity-to-state mapping is learned using discriminative methods by minimizing a noise-robust loss function. We use this approach to predict hand trajectories on the basis of neural activity in motor cortex of behaving monkeys and find that the proposed approach is more accurate than both a static approach based on support vector regression and the Kalman filter.


Kernel Methods for Implicit Surface Modeling

Neural Information Processing Systems

We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regionsin terms of hyperplanes.


A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

Neural Information Processing Systems

A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex optimization problems,which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which efficient solversare available. Highly competitive numerical results on both artificial and real-world data sets are reported.