Support Vector Machines
Margin Maximizing Loss Functions
Rosset, Saharon, Zhu, Ji, Hastie, Trevor J.
Margin maximizing properties play an important role in the analysis of classi£cation models, such as boosting and support vector machines. Margin maximization is theoretically interesting because it facilitates generalization error analysis, and practically interesting because it presents a clear geometric interpretation of the models being built. We formulate and prove a suf£cient condition for the solutions of regularized loss functions to converge to margin maximizing separators, as the regularization vanishes. This condition covers the hinge loss of SVM, the exponential loss of AdaBoost and logistic regression loss. We also generalize it to multi-class classi£cation problems, and present margin maximizing multiclass versions of logistic regression and support vector machines.
Approximate Analytical Bootstrap Averages for Support Vector Classifiers
Malzahn, Dörthe, Opper, Manfred
We compute approximate analytical bootstrap averages for support vector classification using a combination of the replica method of statistical physics and the TAP approach for approximate inference. We test our method on a few datasets and compare it with exact averages obtained by extensive Monte-Carlo sampling.
Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
Bartlett, Peter L., Jordan, Michael I., Mcauliffe, Jon D.
Many classification algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by providing a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds under the weakest possible condition on the loss function--that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a finite-dimensional base class.
Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds
The decision functions constructed by support vector machines (SVM's) usually depend only on a subset of the training set--the so-called support vectors. We derive asymptotically sharp lower and upper bounds on the number of support vectors for several standard types of SVM's. In particular, we show for the Gaussian RBF kernel that the fraction of support vectors tends to twice the Bayes risk for the L1-SVM, to the probability of noise for the L2-SVM, and to 1 for the LS-SVM.
Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
Vollgraf, Roland, Scholz, Michael, Meinertzhagen, Ian A., Obermayer, Klaus
Nonlinear filtering can solve very complex problems, but typically involve very time consuming calculations. Here we show that for filters that are constructed as a RBF network with Gaussian basis functions, a decomposition into linear filters exists, which can be computed efficiently in the frequency domain, yielding dramatic improvement in speed. We present an application of this idea to image processing. In electron micrograph images of photoreceptor terminals of the fruit fly, Drosophila, synaptic vesicles containing neurotransmitter should be detected and labeled automatically. We use hand labels, provided by human experts, to learn a RBF filter using Support Vector Regression with Gaussian kernels. We will show that the resulting nonlinear filter solves the task to a degree of accuracy, which is close to what can be achieved by human experts. This allows the very time consuming task of data evaluation to be done efficiently.
Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
Quinlan, Michael J., Chalup, Stephan K., Middleton, Richard H.
This article addresses the issues of colour classification and collision detection as they occur in the legged league robot soccer environment of RoboCup. We show how the method of one-class classification with support vector machines (SVMs) can be applied to solve these tasks satisfactorily using the limited hardware capacity of the prescribed Sony AIBO quadruped robots. The experimental evaluation shows an improvement over our previous methods of ellipse fitting for colour classification and the statistical approach used for collision detection.
New Algorithms for Efficient High Dimensional Non-parametric Classification
liu, Ting, Moore, Andrew W., Gray, Alexander
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.
Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
Ralaivola, Liva, d', Alché-Buc, Florence
We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling (KDM), a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: first, to learn the parameters of the model, and second, to compute preimages of the time series predicted in the feature space by means of Support Vector Regression. Our model shows strong connection with the classic Kalman Filter model, with the kernel feature space as hidden state space. Kernel Dynamical Modeling is tested against two benchmark time series and achieves high quality predictions.
Sparse Greedy Minimax Probability Machine Classification
Strohmann, Thomas R., Belitski, Andrei, Grudic, Gregory Z., DeCoste, Dennis
The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e.
1-norm Support Vector Machines
Zhu, Ji, Rosset, Saharon, Tibshirani, Robert, Hastie, Trevor J.
The standard 2-norm SVM is known for its good performance in twoclass classi£cation. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an ef£cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM.