Support Vector Machines
Asymptotic Universality for Learning Curves of Support Vector Machines
Opper, Manfred, Urbanczik, Robert
Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1 Introduction Powerful systems for data inference, like neural networks implement complex inputoutput relations by learning from example data. The price one has to pay for the flexibility of these models is the need to choose the proper model complexity for a given task, i.e. the system architecture which gives good generalization ability for novel data. This has become an important problem also for support vector machines [1].
Classifying Single Trial EEG: Towards Brain Computer Interfacing
Blankertz, Benjamin, Curio, Gabriel, Müller, Klaus-Robert
Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger movements in a natural keyboard typing condition and predicts their laterality. This can be done on average 100-230 ms before the respective key is actually pressed, i.e., long before the onset of EMG. Our approach is appealing for its short response time and high classification accuracy ( 96%) in a binary decision where no human training is involved. We compare discriminative classifiers like Support Vector Machines (SVMs) and different variants of Fisher Discriminant that possess favorable regularization properties for dealing with high noise cases (inter-trial variablity).
Adaptive Sparseness Using Jeffreys Prior
In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters thatneed to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments withseveral publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.
A kernel method for multi-labelled classification
Elisseeff, André, Weston, Jason
This article presents a Support Vector Machine (SVM) like learning system tohandle multi-label problems. Such problems are usually decomposed intomany two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties withSVMs. We tested it on a Yeast gene functional classification problem with positive results.
Adaptive Nearest Neighbor Classification Using Support Vector Machines
Domeniconi, Carlotta, Gunopulos, Dimitrios
The nearest neighbor technique is a simple and appealing method to address classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples dueto the curse of dimensionality. We propose a technique that computes a locally flexible metric by means of Support Vector Machines (SVMs). The maximum margin boundary found by the SVM is used to determine the most discriminant direction over the query's neighborhood. Such direction provides a local weighting scheme for input features.
Dynamic Time-Alignment Kernel in Support Vector Machine
Shimodaira, Hiroshi, Noma, Ken-ichi, Nakai, Mitsuru, Sagayama, Shigeki
A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of nonlinear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs).
A Parallel Mixture of SVMs for Very Large Scale Problems
Collobert, Ronan, Bengio, Samy, Bengio, Yoshua
Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems but they suffer from the complexity of their training algorithmwhich is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs. The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset. Experiments on a large benchmark dataset (Forest) as well as a difficult speech database, yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples) . In addition, and that is a surprise, a significant improvement in generalization was observed on Forest. 1 Introduction Recently a lot of work has been done around Support Vector Machines [9], mainly due to their impressive generalization performances on classification problems when compared to other algorithms such as artificial neural networks [3, 6].
Duality, Geometry, and Support Vector Regression
We develop an intuitive geometric framework for support vector regression (SVR). By examining when ɛ-tubes exist, we show that SVR can be regarded as a classification problem in the dual space. Hard and soft ɛ-tubes are constructed by separating the convex or reduced convex hulls respectively of the training data with the response variable shifted up and down by ɛ. A novel SVR model is proposed based on choosing the max-margin plane between the two shifted datasets.
Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
We propose a framework based on a parametric quadratic programming (QP)technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, whilethe second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing techniquewhich obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVMtraining problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Samplingtechniques may further boost convergence.
Kernel Logistic Regression and the Import Vector Machine
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs aswell as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.