Support Vector Machines
Enhancements of Multi-class Support Vector Machine Construction from Binary Learners using Generalization Performance
Songsiri, Patoomsiri, Phetkaew, Thimaporn, Kijsirikul, Boonserm
We propose several novel methods for enhancing the multi-class SVMs by applying the generalization performance of binary classifiers as the core idea. This concept will be applied on the existing algorithms, i.e., the Decision Directed Acyclic Graph (DDAG), the Adaptive Directed Acyclic Graphs (ADAG), and Max Wins. Although in the previous approaches there have been many attempts to use some information such as the margin size and the number of support vectors as performance estimators for binary SVMs, they may not accurately reflect the actual performance of the binary SVMs. We show that the generalization ability evaluated via a cross-validation mechanism is more suitable to directly extract the actual performance of binary SVMs. Our methods are built around this performance measure, and each of them is crafted to overcome the weakness of the previous algorithm. The proposed methods include the Reordering Adaptive Directed Acyclic Graph (RADAG), Strong Elimination of the classifiers (SE), Weak Elimination of the classifiers (WE), and Voting based Candidate Filtering (VCF). Experimental results demonstrate that our methods give significantly higher accuracy than all of the traditional ones. Especially, WE provides significantly superior results compared to Max Wins which is recognized as the state of the art algorithm in terms of both accuracy and classification speed with two times faster in average. Introduction The support vector machine (SVM) [1, 2] is a high performance learning algorithm constructing a hyperplane to separate two-class data by maximizing the margin between them. There are two approaches for extending SVMs to multi-class problems, i.e., solving the problem by formulating all classes of data under a single optimization, and combining several two-class subproblems. However, the difficulty and complexity to solve the problem with the first method are due to the increase of the number of classes and the size of training data, so the second method is more suitable for practical use. In this paper, we focus on the second approach. For constructing a multi-class classifier from binary ones, the method called one-against-one trains each binary classifier on only two out ofN classes, and builds N (N 1)/ 2 possible classifiers. Several strategies have been proposed for combining the trained classifiers to make the final classification for an unseen data. Friedman [3] suggested the combination strategy called Max Wins .
Optimizing Cepstral Features for Audio Classification
Fu, Zhouyu (University of Western Sydney, Kingswood Campus) | Lu, Guojun (Monash University, Gippsland Campus) | Ting, Kai Ming (Monash University Gippsland Campus) | Zhang, Dengsheng (Monash University, Gippsland Campus)
Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
Learning Visual Symbols for Parsing Human Poses in Images
Wang, Fang (NICTA,ย Nanjing University of Science and Technology) | Li, Yi (NICTA)
Parsing human poses in images is fundamental in extracting critical visual information for artificial intelligent agents. Our goal is to learn self-contained body part representations from images, which we call visual symbols, and their symbol-wise geometric contexts in this parsing process. Each symbol is individually learned by categorizing visual features leveraged by geometric information. In the categorization, we use Latent Support Vector Machine followed by an efficient cross validation procedure. Then, these symbols naturally define geometric contexts of body parts in a fine granularity for effective inference. When the structure of the compositional parts is a tree, we derive an efficient approach to estimating human poses in images. Experiments on two large datasets suggest our approach outperforms state of the art methods.
Unlearning from Demonstration
Sullivan, Keith (George Mason University) | ElMolla, Ahmed (George Mason University) | Squires, Bill (George Mason University) | Luke, Sean (George Mason University)
When doing learning from demonstration, it is often the case that the demonstrator provides corrective examples to fix errant behavior by the agent or robot. We present a set of algorithms which use this corrective data to identify and remove noisy examples in datasets which caused errant classifications, and ultimately errant behavior. The objective is to actually modify the source datasets rather than solely rely on the noise-insensitivity of the classification algorithm. This is particularly useful in the sparse datasets often found in learning from demonstration experiments. Our approach tries to distinguish between noisy misclassification and mere undersampling of the learning space. If errors are a result of misclassification, we potentially remove the responsible points and update the classifier. We demonstrate our method on UCI Machine Learning datasets at different levels of sparsity and noise, using decision trees, K-Nearest-Neighbor, and support vector machines.
Multi-view Laplacian Support Vector Machines
We propose a new approach, multi-view Laplacian support vector machines (SVMs), for semi-supervised learning under the multi-view scenario. It integrates manifold regularization and multi-view regularization into the usual formulation of SVMs and is a natural extension of SVMs from supervised learning to multi-view semi-supervised learning. The function optimization problem in a reproducing kernel Hilbert space is converted to an optimization in a finite-dimensional Euclidean space. After providing a theoretical bound for the generalization performance of the proposed method, we further give a formulation of the empirical Rademacher complexity which affects the bound significantly. From this bound and the empirical Rademacher complexity, we can gain insights into the roles played by different regularization terms to the generalization performance. Experimental results on synthetic and real-world data sets are presented, which validate the effectiveness of the proposed multi-view Laplacian SVMs approach.
Supervised Metric Learning with Generalization Guarantees
The crucial importance of metrics in machine learning algorithms has led to an increasing interest in optimizing distance and similarity functions, an area of research known as metric learning. When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance. Less work has been devoted to metric learning from structured objects (such as strings or trees), most of it focusing on optimizing a notion of edit distance. We identify two important limitations of current metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not been studied so far. Second, the question of the generalization ability of metric learning methods has been largely ignored. In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (e,g,t)-good similarity functions. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend these ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (e,g,t)-goodness. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms based on a notion of algorithmic robustness.
Bayesian Structured Prediction Using Gaussian Processes
Bratieres, Sebastien, Quadrianto, Novi, Ghahramani, Zoubin
We introduce a conceptually novel structured prediction model, GPstruct, which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M3N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.
Does One-Against-All or One-Against-One Improve the Performance of Multiclass Classifications?
Eichelberger, Robert Kyle (University of Central Arkansas) | Sheng, Victor S. (Department of Computer Science, University of Central Arkansas)
One-against-all and one-against-one are two popular methodologies for reducing multiclass classification problems into a set of binary classifications. In this paper, we are interested in the performance of both one-against-all and one-against-one for classification algorithms, such as decision tree, naรฏve bayes, support vector machine, and logistic regression. Since both one-against-all and one-against-one work like creating a classification committee, they are expected to improve the performance of classification algorithms. However, our experimental results surprisingly show that one-against-all worsens the performance of the algorithms on most datasets. One-against-one helps, but performs worse than the same iterations of bagging these algorithms. Thus, we conclude that both one-against-all and one-against-one should not be used for the algorithms that can perform multiclass classifications directly. Bagging is better approach for improving their performance.
Accuracy and Timeliness in ML Based Activity Recognition
Ross, Robert (Dublin Institute of Technology) | Kelleher, John (Dublin Institute of Technology)
While recent Machine Learning (ML) based techniques for activity recognition show great promise, there remain a number of questions with respect to the relative merits of these techniques. To provide a better understanding of the relative strengths of contemporary Activity Recognition methods, in this paper we present a comparative analysis of Hidden Markov Model, Bayesian, and Support Vector Machine based human activity recognition models. The study builds on both pre-existing and newly annotated data which includes interleaved activities. Results demonstrate that while Support Vector Machine based techniques perform well for all data sets considered, simple representations of sensor histories regularly outperform more complex count based models.