Support Vector Machines
Support Vector Machines with a Reject Option
Grandvalet, Yves, Rakotomamonjy, Alain, Keshet, Joseph, Canu, Stรฉphane
We consider the problem of binary classification where the classifier may abstain instead of classifying each observation. The Bayes decision rule for this setup, known as Chow's rule, is defined by two thresholds on posterior probabilities. From simple desiderata, namely the consistency and the sparsity of the classifier, we derive the double hinge loss function that focuses on estimating conditional probabilities only in the vicinity of the threshold points of the optimal decision rule. We show that, for suitable kernel machines, our approach is universally consistent. We cast the problem of minimizing the double hinge loss as a quadratic program akin to the standard SVM optimization problem and propose an active set method to solve it efficiently.
Speaker Comparison with Inner Product Discriminant Functions
Karam, Zahi, Sturim, Douglas, Campbell, William M.
Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications---speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison: support vector machines, joint factor analysis, and linear scoring.
Universal Consistency of Multi-Class Support Vector Classification
Steinwart was the first to prove universal consistency of support vector machine classification. His proof analyzed the'standard' support vector machine classifier, which is restricted to binary classification problems. In contrast, recent analysis has resulted in the common belief that several extensions of SVM classification to more than two classes are inconsistent. Our proof extends Steinwart's techniques to the multi-class case. Papers published at the Neural Information Processing Systems Conference.
Spatial and anatomical regularization of SVM for brain image analysis
Cuingnet, Remi, Chupin, Marie, Benali, Habib, Colliot, Olivier
Support vector machines (SVM) are increasingly used in brain image analyses since they allow capturing complex multivariate relationships in the data. Moreover, when the kernel is linear, SVMs can be used to localize spatial patterns of discrimination between two groups of subjects. However, the features' spatial distribution is not taken into account. As a consequence, the optimal margin hyperplane is often scattered and lacks spatial coherence, making its anatomical interpretation difficult. This paper introduces a framework to spatially regularize SVM for brain image analysis.
Universal Kernels on Non-Standard Input Spaces
Christmann, Andreas, Steinwart, Ingo
During the last years support vector machines (SVMs) have been successfully applied even in situations where the input space $X$ is not necessarily a subset of $R d$. Examples include SVMs using probability measures to analyse e.g. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) $H\subset L_p(P_X)$ is dense, or if the SVM is based on a universal kernel $k$. So far, however, there are no RKHSs of practical interest known that satisfy these assumptions on $\cH$ or $k$ if $X ot\subset R d$. We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of $R d$. We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing.
Learning with Recursive Perceptual Representations
Vinyals, Oriol, Jia, Yangqing, Deng, Li, Darrell, Trevor
Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs.
On Coresets for Support Vector Machines
Tukan, Murad, Baykal, Cenk, Feldman, Dan, Rus, Daniela
We present an efficient coreset construction algorithm for large-scale Support Vector Machine (SVM) training in Big Data and streaming applications. A coreset is a small, representative subset of the original data points such that a models trained on the coreset are provably competitive with those trained on the original data set. Since the size of the coreset is generally much smaller than the original set, our preprocess-then-train scheme has potential to lead to significant speedups when training SVM models. We prove lower and upper bounds on the size of the coreset required to obtain small data summaries for the SVM problem. As a corollary, we show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings. We evaluate the performance of our algorithm on real-world and synthetic data sets. Our experimental results reaffirm the favorable theoretical properties of our algorithm and demonstrate its practical effectiveness in accelerating SVM training.
Approximating Concavely Parameterized Optimization Problems
Giesen, Joachim, Mueller, Jens, Laue, Soeren, Swiercy, Sascha
We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the parameter can always be approximated with accuracy $\varepsilon 0$ by a set of size $O(1/\sqrt{\varepsilon})$. A lower bound of size $\Omega (1/\sqrt{\varepsilon})$ shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an approximate path of size $O(1/\sqrt{\varepsilon})$. Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. Papers published at the Neural Information Processing Systems Conference.
Advice Refinement in Knowledge-Based SVMs
Kunapuli, Gautam, Maclin, Richard, Shavlik, Jude W.
Knowledge-based support vector machines (KBSVMs) incorporate advice from domain experts, which can improve generalization significantly. A major limitation that has not been fully addressed occurs when the expert advice is imperfect, which can lead to poorer models. We propose a model that extends KBSVMs and is able to not only learn from data and advice, but also simultaneously improve the advice. The proposed approach is particularly effective for knowledge discovery in domains with few labeled examples. The proposed model contains bilinear constraints, and is solved using two iterative approaches: successive linear programming and a constrained concave-convex approach.
Infinite Latent SVM for Classification and Multi-task Learning
Zhu, Jun, Chen, Ning, Xing, Eric P.
Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes' theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.