Goto

Collaborating Authors

 Support Vector Machines


Safe and Efficient Screening For Sparse Support Vector Machine

arXiv.org Machine Learning

Assume that X E Him" is a data set containing 71 samples, X: (x1, . . . Let w*()\) be the optimal solution of Eq. (1) All the features With nonzero values in "w" (A) are called active The Lagrangian multiplier [1] of the problem defined in Eq. (1) is: The Eq. (2) can be reformulated as: Since the problem defined in Eq. (1) is convex and the optimal value of the In the preceding equation i'j: ij, and Y is a diagonal matrix and YM: When the input is given, it can be obtained in a closed form. The Ll--regularized L2--Loss SVM in Eq. (1) can be rewritten in an uncon-- Eq. (22) shows that the necessary condition for a feature f to be active in the To bound value of 0Tf' 7 we need to first construct a closed convex set K that We first study how to construct the convex set K. In the following, we construct a closed convex set K based on Eq. (19) and The proof of this proposition can be found in [2]. Let 01 and 02 be the optimal solutions of the problem defined in Eq. (19) for Assume that /\1 A2, and 01 is known. In the preceding equations, 01, A1, and /\2 are known. Figure 1 shows an example of the K in a two dimensional space. And K is indicated by the shaded area. It is indicated by the shaded area. Besides the n dimensional hyperball defined in Eq. (32), it is possible to By applying Proposition 6.1 to the objective function defined in Eq. (33) for 01, Let t::--: Z 0. By substituting 0: 02 and 0: 01 into Eq. Eq. (35)7 respectively, and then combining the two obtained equations7 the As the value of t change from 0 to 007 Eq. (36) generates a series of hyperball. Eq. (36) reaches it minimum when, The theorem can be proved by minimizing the 7" defined in Eq. (36).


Para-active learning

arXiv.org Machine Learning

Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show that its performance does not deteriorate when the sifting process relies on a slightly outdated model. Parallel active learning is particularly attractive to train nonlinear models with non-linear representations because there are few practical parallel learning algorithms for such models. We report preliminary experiments using both kernel SVMs and SGD-trained neural networks.


Scaling SVM and Least Absolute Deviations via Exact Data Reduction

arXiv.org Machine Learning

The support vector machine (SVM) is a widely used method for classification. Although many efforts have been devoted to develop efficient solvers, it remains challenging to apply SVM to large-scale problems. A nice property of SVM is that the non-support vectors have no effect on the resulting classifier. Motivated by this observation, we present fast and efficient screening rules to discard non-support vectors by analyzing the dual problem of SVM via variational inequalities (DVI). As a result, the number of data instances to be entered into the optimization can be substantially reduced. Some appealing features of our screening method are: (1) DVI is safe in the sense that the vectors discarded by DVI are guaranteed to be non-support vectors; (2) the data set needs to be scanned only once to run the screening, whose computational cost is negligible compared to that of solving the SVM problem; (3) DVI is independent of the solvers and can be integrated with any existing efficient solvers. We also show that the DVI technique can be extended to detect non-support vectors in the least absolute deviations regression (LAD). To the best of our knowledge, there are currently no screening methods for LAD. We have evaluated DVI on both synthetic and real data sets. Experiments indicate that DVI significantly outperforms the existing state-of-the-art screening rules for SVM, and is very effective in discarding non-support vectors for LAD. The speedup gained by DVI rules can be up to two orders of magnitude.


On the Suitable Domain for SVM Training in Image Coding

arXiv.org Machine Learning

Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an $n$-dimensional rectangle defined by the $\varepsilon$-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular $\varepsilon$-insensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning.


A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training

arXiv.org Artificial Intelligence

Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as {the Frank-Wolfe (FW) method}. In particular, this procedure has been successfully applied to train large-scale instances of non-linear Support Vector Machines (SVMs). Specializing FW to SVM training has allowed to obtain efficient algorithms but also important theoretical results, including convergence analysis of training algorithms and new characterizations of model sparsity. In this paper, we present and analyze a novel variant of the FW method based on a new way to perform away steps, a classic strategy used to accelerate the convergence of the basic FW procedure. Our formulation and analysis is focused on a general concave maximization problem on the simplex. However, the specialization of our algorithm to quadratic forms is strongly related to some classic methods in computational geometry, namely the Gilbert and MDM algorithms. On the theoretical side, we demonstrate that the method matches the guarantees in terms of convergence rate and number of iterations obtained by using classic away steps. In particular, the method enjoys a linear rate of convergence, a result that has been recently proved for MDM on quadratic forms. On the practical side, we provide experiments on several classification datasets, and evaluate the results using statistical tests. Experiments show that our method is faster than the FW method with classic away steps, and works well even in the cases in which classic away steps slow down the algorithm. Furthermore, these improvements are obtained without sacrificing the predictive accuracy of the obtained SVM model.


Flexible High-dimensional Classification Machines and Their Asymptotic Properties

arXiv.org Machine Learning

Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data. A unified family of classification machines, the FLexible Assortment MachinE (FLAME) is proposed, within which DWD and SVM are special cases. The FLAME family helps to identify the similarities and differences between SVM and DWD. It is well known that many classifiers overfit the data in the high-dimensional setting; and others are sensitive to the imbalanced data, that is, the class with a larger sample size overly influences the classifier and pushes the decision boundary towards the minority class. SVM is resistant to the imbalanced data issue, but it overfits high-dimensional data sets by showing the undesired data-piling phenomena. The DWD method was proposed to improve SVM in the high-dimensional setting, but its decision boundary is sensitive to the imbalanced ratio of sample sizes. Our FLAME family helps to understand an intrinsic connection between SVM and DWD, and improves both methods by providing a better trade-off between sensitivity to the imbalanced data and overfitting the high-dimensional data. Several asymptotic properties of the FLAME classifiers are studied. Simulations and real data applications are investigated to illustrate the usefulness of the FLAME classifiers.


Latent Fisher Discriminant Analysis

arXiv.org Machine Learning

Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. Previous studies have also extended the binary-class case into multi-classes. However, many applications, such as object detection and keyframe extraction cannot provide consistent instance-label pairs, while LDA requires labels on instance level for training. Thus it cannot be directly applied for semi-supervised classification problem. In this paper, we overcome this limitation and propose a latent variable Fisher discriminant analysis model. We relax the instance-level labeling into bag-level, is a kind of semi-supervised (video-level labels of event type are required for semantic frame extraction) and incorporates a data-driven prior over the latent variables. Hence, our method combines the latent variable inference and dimension reduction in an unified bayesian framework. We test our method on MUSK and Corel data sets and yield competitive results compared to the baseline approach. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.


mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

arXiv.org Machine Learning

High-throughput sequencing technology applied to cellular mRNA (RNA-Seq) has revolutionized transcriptome studies [19, 17, 35, among many others]. In contrast to microarray platforms, which it has replaced in many applications, RNA-Seq can not only be used to accurately quantify known transcripts, but also to reveal the precise structure of transcripts at single-nucleotide resolution. RNA-Seq based transcript reconstruction has therefore become a valuable tool for the completion of genome annotations [22, 11, for instance] and further enabled subsequent analyses of differentially expressed genes [2], transcript isoforms [6, 4] and exons [3], all of which generally rely on correctly inferred transcript inventories. De novo transcript reconstruction is thus a pivotal step in the analysis of RNA-Seq data. There are two conceptually different strategies to approach this problem: one can either assemble transcripts directly from RNA-Seq reads using methodology that originated from genome assembly approaches [13, 23, 25].


Network Anomaly Detection: A Survey and Comparative Analysis of Stochastic and Deterministic Methods

arXiv.org Machine Learning

We present five methods to the problem of network anomaly detection. These methods cover most of the common techniques in the anomaly detection field, including Statistical Hypothesis Tests (SHT), Support Vector Machines (SVM) and clustering analysis. We evaluate all methods in a simulated network that consists of nominal data, three flow-level anomalies and one packet-level attack. Through analyzing the results, we point out the advantages and disadvantages of each method and conclude that combining the results of the individual methods can yield improved anomaly detection results.


Local Support Vector Machines:Formulation and Analysis

arXiv.org Machine Learning

We provide a formulation for Local Support Vector Machines (LSVMs) that generalizes previous formulations, and brings out the explicit connections to local polynomial learning used in nonparametric estimation literature. We investigate the simplest type of LSVMs called Local Linear Support Vector Machines (LLSVMs). For the first time we establish conditions under which LLSVMs make Bayes consistent predictions at each test point $x_0$. We also establish rates at which the local risk of LLSVMs converges to the minimum value of expected local risk at each point $x_0$. Using stability arguments we establish generalization error bounds for LLSVMs.