Support Vector Machines
Adaptive Sparseness Using Jeffreys Prior
In this paper we introduce a new sparseness inducing prior which does not involve any (hy- per)parameters that need to be adjusted or estimated. Although other applications are possi- ble, we focus here on supervised learning problems: regression and classification. Experi- ments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparseness- controlling hyper-parameters.
Covariance Kernels from Bayesian Generative Models
We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this frame(cid:173) work which uses variational Bayesian mixtures of factor analyzers in order to attack classification problems in high-dimensional spaces where labeled data is sparse, but unlabeled data is abundant.
Dynamic Time-Alignment Kernel in Support Vector Machine
A new class of Support Vector Machine (SVM) that is applica- ble to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, stan- dard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimen- tal results show comparable recognition performance with hidden Markov models (HMMs). 1 Introduction Support Vector Machine (SVM) [1] is one of the latest and most successful statistical pattern classifier that utilizes a kernel technique [2, 3].
Adaptive Nearest Neighbor Classification Using Support Vector Machines
The nearest neighbor technique is a simple and appealing method to address classification problems. It relies on t he assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of exam(cid:173) ples due to the curse of dimensionality. We propose a technique that computes a locally flexible metric by means of Support Vector Machines (SVMs). The maximum margin boundary found by the SVM is used to determine the most discriminant direction over the query's neighborhood. Such direction provides a local weighting scheme for input features.
A Sequence Kernel and its Application to Speaker Recognition
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expan- sion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error train- ing. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
Asymptotic Universality for Learning Curves of Support Vector Machines
Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal general(cid:173) ization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel.
Batch Value Function Approximation via Support Vectors
We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formu(cid:173) lations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradi(cid:173) ent methods, the kernel methods described here can easily'adjust the complexity of the function approximator to fit the complexity of the value function.
Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
We propose a framework based on a parametric quadratic program(cid:173) ming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias prob(cid:173) lem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other ex(cid:173) isting technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the com(cid:173) plete SVM training problem.
A kernel method for multi-labelled classification
This article presents a Support Vector Machine (SVM) like learning sys- tem to handle multi-label problems. Such problems are usually decom- posed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common proper- ties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
A Parallel Mixture of SVMs for Very Large Scale Problems
Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems but they suffer from the complexity of their train(cid:173) ing algorithm which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs. The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset. Experiments on a large benchmark dataset (Forest) as well as a difficult speech database, yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples) . In addition, and that is a surprise, a significant improvement in generalization was observed on Forest.