Support Vector Machines
A Topographic Support Vector Machine: Classification Using Local Label Configurations
The standard approach to the classification of objects is to consider the examples as independent and identically distributed (iid). In many real world settings, however, this assumption is not valid, because a topo- graphical relationship exists between the objects. In this contribution we consider the special case of image segmentation, where the objects are pixels and where the underlying topography is a 2D regular rectangular grid. We introduce a classification method which not only uses measured vectorial feature information but also the label configuration within a to- pographic neighborhood. Due to the resulting dependence between the labels of neighboring pixels, a collective classification of a set of pixels becomes necessary.
Maximal Margin Labeling for Multi-Topic Text Categorization
In this paper, we address the problem of statistical learning for multi- topic text categorization (MTC), whose goal is to choose all relevant top- ics (a label) from a given set of topics. The proposed algorithm, Max- imal Margin Labeling (MML), treats all possible labels as independent classes and learns a multi-class classifier on the induced multi-class cate- gorization problem. To cope with the data sparseness caused by the huge number of possible labels, MML combines some prior knowledge about label prototypes and a maximal margin criterion in a novel way. Experi- ments with multi-topic Web pages show that MML outperforms existing learning algorithms including Support Vector Machines. This paper addresses the problem of learning for multi-topic text categorization (MTC), whose goal is to select all topics relevant to a text from a given set of topics.
Consistency of one-class SVM and related algorithms
We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator.
Two view learning: SVM-2K, Theory and Practice
Kernel methods make it relatively easy to define complex highdimensional feature spaces. This raises the question of how we can identify the relevant subspaces for a particular learning task. When two views of the same phenomenon are available kernel Canonical Correlation Analysis (KCCA) has been shown to be an effective preprocessing step that can improve the performance of classification algorithms such as the Support Vector Machine (SVM). This paper takes this observation to its logical conclusion and proposes a method that combines this two stage learning (KCCA followed by SVM) into a single optimisation termed SVM-2K. We present both experimental and theoretical analysis of the approach showing encouraging results and insights.
Large Scale Hidden Semi-Markov SVMs
We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology.
An Oracle Inequality for Clipped Regularized Risk Minimizers
We establish a general oracle inequality for clipped approximate minimizers of regularized empirical risks and apply this inequality to support vector machine (SVM) type algorithms. We then show that for SVMs using Gaussian RBF kernels for classification this oracle inequality leads to learning rates that are faster than the ones established in [9]. Finally, we use our oracle inequality to show that a simple parameter selection approach based on a validation set can yield the same fast learning rates without knowing the noise exponents which were required to be known a-priori in [9].
An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold crossvalidation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations. .
Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three ma jor problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset.
Support Vector Machines on a Budget
The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to define the generated classifier. We present a modified version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM for- mulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results.
Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype
Starting with the work of Jaakkola and Haussler, a variety of approaches have been proposed for coupling domain-specific generative models with statistical learning methods. The link is established by a kernel function which provides a similarity measure based inherently on the underlying model. In computational biology, the full promise of this framework has rarely ever been exploited, as most kernels are derived from very generic models, such as sequence profiles or hidden Markov models. Here, we introduce the MTreeMix kernel, which is based on a generative model tailored to the underlying biological mechanism. We compare this novel kernel to a standard, evolution-agnostic amino acid encoding in the prediction of HIV drug resistance from genotype, using support vector regression.