Support Vector Machines
Support Vector Machines for Multiple-Instance Learning
Andrews, Stuart, Tsochantaridis, Ioannis, Hofmann, Thomas
This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a state-of-the-art classification technique, including nonlinear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical dataset and on applications in automated image indexing and document categorization. 1 Introduction Multiple-instance learning (MIL) [4] is a generalization of supervised classification in which training class labels are associated with sets of patterns, or bags, instead of individual patterns. While every pattern may possess an associated true label, it is assumed that pattern labels are only indirectly accessible through labels attached to bags.
Adaptive Scaling for Feature Selection in SVMs
Grandvalet, Yves, Canu, Stรฉphane
This paper introduces an algorithm for the automatic relevance determination ofinput variables in kernelized Support Vector Machines. Relevance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures anddemonstrates its effectiveness on a demanding facial expression recognition problem.
Knowledge-Based Support Vector Machine Classifiers
Fung, Glenn M., Mangasarian, Olvi L., Shavlik, Jude W.
Prior knowledge in the form of multiple polyhedral sets, each belonging toone of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leadsto a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in test set accuracy after the incorporation ofprior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier,based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data.
The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
Williams, Christopher, Shawe-taylor, John S.
I. Williams School of Informatics University of Edinburgh c.k.i.williams ed.ac.uk Abstract In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(ยท, .) We bound the differences betweenthe two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion)and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.
Dyadic Classification Trees via Structural Risk Minimization
Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.
Categorization by Learning and Combining Object Parts
Heisele, Bernd, Serre, Thomas, Pontil, Massimiliano, Vetter, Thomas, Poggio, Tomaso
We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence.
Batch Value Function Approximation via Support Vectors
Dietterich, Thomas G., Wang, Xin
Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.
Speech Recognition using SVMs
An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space.