Support Vector Machines
STC Anti-spoofing Systems for the ASVspoof 2015 Challenge
Novoselov, Sergey, Kozlov, Alexandr, Lavrentyeva, Galina, Simonchik, Konstantin, Shchemelinin, Vadim
This paper presents the Speech Technology Center (STC) systems submitted to Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) Challenge 2015. In this work we investigate different acoustic feature spaces to determine reliable and robust countermeasures against spoofing attacks. In addition to the commonly used front-end MFCC features we explored features derived from phase spectrum and features based on applying the multiresolution wavelet transform. Similar to state-of-the-art ASV systems, we used the standard TV-JFA approach for probability modelling in spoofing detection systems. Experiments performed on the development and evaluation datasets of the Challenge demonstrate that the use of phase-related and wavelet-based features provides a substantial input into the efficiency of the resulting STC systems. In our research we also focused on the comparison of the linear (SVM) and nonlinear (DBN) classifiers.
Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Germain, Pascal, Lacasse, Alexandre, Laviolette, François, Marchand, Mario, Roy, Jean-Francis
We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.
Optimal Learning Rates for Localized SVMs
One of the limiting factors of using support vector machines (SVMs) in large scale applications are their super-linear computational requirements in terms of the number of training samples. To address this issue, several approaches that train SVMs on many small chunks of large data sets separately have been proposed in the literature. So far, however, almost all these approaches have only been empirically investigated. In addition, their motivation was always based on computational requirements. In this work, we consider a localized SVM approach based upon a partition of the input space. For this local SVM, we derive a general oracle inequality. Then we apply this oracle inequality to least squares regression using Gaussian kernels and deduce local learning rates that are essentially minimax optimal under some standard smoothness assumptions on the regression function. This gives the first motivation for using local SVMs that is not based on computational requirements but on theoretical predictions on the generalization performance. We further introduce a data-dependent parameter selection method for our local SVM approach and show that this method achieves the same learning rates as before. Finally, we present some larger scale experiments for our localized SVM showing that it achieves essentially the same test performance as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are significantly better.
Manitest: Are classifiers really invariant?
Fawzi, Alhussein, Frossard, Pascal
Invariance to geometric transformations is a highly desirable property of automatic classifiers in many image recognition tasks. Nevertheless, it is unclear to which extent state-of-the-art classifiers are invariant to basic transformations such as rotations and translations. This is mainly due to the lack of general methods that properly measure such an invariance. In this paper, we propose a rigorous and systematic approach for quantifying the invariance to geometric transformations of any classifier. Our key idea is to cast the problem of assessing a classifier's invariance as the computation of geodesics along the manifold of transformed images. We propose the Manitest method, built on the efficient Fast Marching algorithm to compute the invariance of classifiers. Our new method quantifies in particular the importance of data augmentation for learning invariance from data, and the increased invariance of convolutional neural networks with depth. We foresee that the proposed generic tool for measuring invariance to a large class of geometric transformations and arbitrary classifiers will have many applications for evaluating and comparing classifiers based on their invariance, and help improving the invariance of existing classifiers.
An Efficient Classifier Based on Hierarchical Mixing Linear Support Vector Machines
Wang, Di (Wenzhou University) | Zhang, Xiaoqin (Wenzhou University) | Fan, Mingyu (Wenzhou University) | Ye, Xiuzi (Wenzhou University)
SVM in advance, and this limits their applications to largescale problems. To address this issue, several methods for Support vector machines (SVMs) play a very dominant selecting a set of basis vectors are proposed. They include role in data classification due to their good sampling from the training set in the Nystrom method generalization performance. However, they suffer [Williams and Seeger, 2001] and variants of the Incomplete from the high computational complexity in the Cholesky factorization [Bach and Jordan, 2005], core vector classification phase when there are a considerable machine (CVM) [Tsang et al., 2005], relevance vector machine number of support vectors (SVs). Then it is desirable (RVM)[Tipping, 2001], and relevance units machine to design efficient algorithms in the classification (RUM)[Gao and Zhang, 2009]. Wu et al. [Wu et al., 2006] phase to deal with the datasets of realtime add one constraint on the number of basis vectors to the standard pattern recognition systems. To this end, we SVM optimization problem, and then solve this modified propose a novel classifier called HMLSVMs (Hierarchical nonconvex problem to build sparse kernel learning algorithms Mixing Linear Support Vector Machines) (SKLA). Joachims and Yu [Joachims and Yu, 2009] in this paper, which has a hierarchical structure explore a new sparse kernel SVMs via cutting plane training, with a mixing linear SVMs classifier at each node called cutting-plane subspace pursuit (CPSP).Although and predicts the label of a sample using only a the above methods prunes the SVs and reduces computational few hyperplanes. We also give a generalization complexity in classification phase, when a new test sample is error bound for the class of locally linear SVMs introduced, they still need to compare it with these pruned (LLSVMs) based on the Rademacher theory, which SVs via kernel calculations to predict the label of the test ensures that overfitting can be effectively avoided.
Feature Selection from Microarray Data via an Ordered Search with Projected Margin
Villela, Saulo Moraes (Federal University of Juiz de Fora) | Leite, Saul de Castro (Federal University of Juiz de Fora) | Neto, Raul Fonseca (Federal University of Juiz de Fora)
Microarray experiments are capable of measuring the expression level of thousands of genes simultaneously. Dealing with this enormous amount of information requires complex computation. Support Vector Machines (SVM) have been widely used with great efficiency to solve classification problems that have high dimension. In this sense, it is plausible to develop new feature selection strategies for microarray data that are associated with this type of classifier. Therefore, we propose, in this paper, a new method for feature selection based on an ordered search process to explore the space of possible subsets. The algorithm, called Admissible Ordered Search (AOS), uses as evaluation function the margin values estimated for each hypothesis by a SVM classifier. An important theoretical contribution of this paper is the development of the projected margin concept. This value is computed as the margin vector projection on a lower dimensional subspace and is used as an upper bound for the current value of the hypothesis in the search process. This enables great economy in runtime and consequently efficiency in the search process as a whole. The algorithm was tested using five different microarray data sets yielding superior results when compared to three representative feature selection methods.
Optimizing Locally Linear Classifiers with Supervised Anchor Point Learning
Mao, Xue (Chinese Academy of Sciences) | Fu, Zhouyu (University of Western Sydney) | Wu, Ou (Chinese Academy of Sciences) | Hu, Weiming (Chinese Academy of Sciences)
Kernel SVM suffers from high computational complexity when dealing with large-scale nonlinear datasets. To address this issue, locally linear classifiers have been proposed for approximating nonlinear decision boundaries with locally linear functions using a local coding scheme. The effectiveness of such coding scheme depends heavily on the quality of anchor points chosen to produce the local codes. Existing methods usually involve a phase of unsupervised anchor point learning followed by supervised classifier learning. Thus, the anchor points and classifiers are obtained separately whereas the learned anchor points may not be optimal for the discriminative task. In this paper, we present a novel fully supervised approach for anchor point learning. A single optimization problem is formulated over both anchor point and classifier variables, optimizing the initial anchor points jointly with the classifiers to minimize the classification risk. Experimental results show that our method outperforms other competitive methods which employ unsupervised anchor point learning and achieves performance on par with the kernel SVM albeit with much improved efficiency.
Data Sparseness in Linear SVM
Li, Xiang (University of Western Ontario and National University of Defense Technology) | Wang, Huaimin (National University of Defense Technology) | Gu, Bin (Nanjing University of Information Science Technology and University of Western Ontario) | Ling, Charles X. (University of Western Ontario)
Large sparse datasets are common in many real-world applications. Linear SVM has been shown to be very efficient for classifying such datasets. However, it is still unknown how data sparseness would affect its convergence behavior. To study this problem in a systematic manner, we propose a novel approach to generate large and sparse data from real-world datasets, using statistical inference and the data sampling process in the PAC framework. We first study the convergence behavior of linear SVM experimentally, and make several observations, useful for real-world applications. We then offer theoretical proofs for our observations by studying the Bayes risk and PAC bound. Our experiment and theoretic results are valuable for learning large sparse datasets with linear SVM.
Pre-release Prediction of Crowd Opinion on Movies by Label Distribution Learning
Geng, Xin (Southeast University) | Hou, Peng (Southeast University)
This paper studies an interesting problem: is it possible to predict the crowd opinion about a movie before the movie is actually released? The crowd opinion is here expressed by the distribution of ratings given by a sufficient amount of people. Consequently, the pre-release crowd opinion prediction can be regarded as a Label Distribution Learning (LDL) problem. In order to solve this problem, a Label Distribution Support Vector Regressor (LDSVR) is proposed in this paper. The basic idea of LDSVR is to fit a sigmoid function to each component of the label distribution simultaneously by a multi-output support vector machine. Experimental results show that LDSVR can accurately predict peoples’s rating distribution about a movie just based on the pre-release metadata of the movie.
Modeling Inter- and Intra-Part Deformations for Object Structure Parsing
Cai, Ling (Xiamen University) | Ji, Rongrong (Xiamen University) | Liu, Wei (IBM T. J. Watson Research Center) | Hua, Gang (Stevens Institute of Technology)
Part deformation has been a longstanding challenge for object parsing, of which the primary difficulty lies in modeling the highly diverse object structures. To this end, we propose a novel structure parsing model to capture deformable object structures. The proposed model consists of two de-formable layers: the top layer is an undirected graph that incorporates inter-part deformations to infer object structures; the base layer is consisted of various independent nodes to characterize local intra-part deformations. To learn this two-layer model, we design a layer-wise learning algorithm,which employs matching pursuit and belief propagation for a low computational complexity inference. Specifically, active basis sparse coding is leveraged to build the nodes at the base layer, while the edge weights are estimated by a structural support vector machine. Experimental results on two benchmark datasets (i.e., faces and horses) demonstrate that the proposed model yields superior parsing performance over state-of-the-art models.