Support Vector Machines
Adversarial Structured Prediction for Multivariate Measures
Wang, Hong, Rezaei, Ashkan, Ziebart, Brian D.
Many predicted structured objects (e.g., sequences, matchings, trees) are evaluated using the F-score, alignment error rate (AER), or other multivariate performance measures. Since inductively optimizing these measures using training data is typically computationally difficult, empirical risk minimization of surrogate losses is employed, using, e.g., the hinge loss for (structured) support vector machines. These approximations often introduce a mismatch between the learner's objective and the desired application performance, leading to inconsistency. We take a different approach: adversarially approximate training data while optimizing the exact F-score or AER. Structured predictions under this formulation result from solving zero-sum games between a predictor seeking the best performance and an adversary seeking the worst while required to (approximately) match certain structured properties of the training data. We explore this approach for word alignment (AER evaluation) and named entity recognition (F-score evaluation) with linear-chain constraints.
Support vector comparison machines
Venuto, David, Hocking, Toby Dylan, Sphanurattana, Lakjaree, Sugiyama, Masashi
In ranking problems, the goal is to learn a ranking function from labeled pairs of input points. In this paper, we consider the related comparison problem, where the label indicates which element of the pair is better, or if there is no significant difference. We cast the learning problem as a margin maximization, and show that it can be solved by converting it to a standard SVM. We use simulated nonlinear patterns, a real learning to rank sushi data set, and a chess data set to show that our proposed SVMcompare algorithm outperforms SVMrank when there are equality pairs.
MIT's automated machine learning works 100x faster than human data scientists
A new automated machine learning system can analyze data and come up with a solution 100x faster than humans, according to a new paper from MIT and Michigan State University. This could potentially help businesses take advantage of machine learning's capabilities in a faster, easier way, while also filling data science talent gaps. The system also potentially marks a tipping point in machine learning adoption in the enterprise, which is expected to double in 2018, as TechRepublic's sister site ZDNet reported. When seeking a solution to a problem, data scientists must wade through huge datasets, and choose the modeling technique they believe will work best. The issue is, there are hundreds of techniques to choose from, including neural networks and support vector machines, and choosing the best one could potentially mean the difference between millions of dollars in ad revenue or none, or catching a flaw in a medical device or not.
Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
Devarakonda, Aditya, Fountoulakis, Kimon, Demmel, James, Mahoney, Michael W.
Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating and synchronizing processors in a parallel setting. Iterative ML methods are particularly sensitive to communication cost since they often require communication every iteration. In this work, we extend well-known techniques from Communication-Avoiding Krylov subspace methods to first-order, block coordinate descent methods for Support Vector Machines and Proximal Least-Squares problems. Our Synchronization-Avoiding (SA) variants reduce the latency cost by a tunable factor of $s$ at the expense of a factor of $s$ increase in flops and bandwidth costs. We show that the SA-variants are numerically stable and can attain large speedups of up to $5.1\times$ on a Cray XC30 supercomputer.
Understanding Career Progression in Baseball Through Machine Learning
Bierig, Brian, Hollenbeck, Jonathan, Stroud, Alexander
Abstract-- Professional baseball players are increasingly guaranteed expensive long-term contracts, with over 70 deals signed in excess of $90 million, mostly in the last decade. These are substantial sums compared to a typical franchise valuation of $1-2 billion. Hence, the players to whom a team chooses to give such a contract can have an enormous impact on both competitiveness and profit. Despite this, most published approaches examining career progression in baseball are fairly simplistic. We applied four machine learning algorithms to the problem and soundly improved upon existing approaches, particularly for batting data. I. INTRODUCTION The typical mode of entry for a player into baseball is through the first-year player draft. Players usually enter the draft immediately after high school or college and then spend several years in the drafting team's minor league system. When deemed ready, the drafting team can promote the player to the Major Leagues.
Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine
Liu, Yafeng, Feng, Shimin, Zhao, Zhikai, Ding, Enjie
In this paper we propose the use of quantum genetic algorithm to optimize the support vector machine (SVM) for human action recognition. The Microsoft Kinect sensor can be used for skeleton tracking, which provides the joints' position data. However, how to extract the motion features for representing the dynamics of a human skeleton is still a challenge due to the complexity of human motion. We present a highly efficient features extraction method for action classification, that is, using the joint angles to represent a human skeleton and calculating the variance of each angle during an action time window. Using the proposed representation, we compared the human action classification accuracy of two approaches, including the optimized SVM based on quantum genetic algorithm and the conventional SVM with grid search. Experimental results on the MSR-12 dataset show that the conventional SVM achieved an accuracy of $ 93.85\% $. The proposed approach outperforms the conventional method with an accuracy of $ 96.15\% $.
Machine learning: Supervised methods (PDF Download Available)
We'll illustrate SVM using a two-class problem and begin with Typically, C is chosen using cross-validation2. Points at the margin's edge (black outlines) are called The margin is now 0.64 with six support vectors. AU: the title is long and a bit clunky. What do you think about deleting'supervised methods' from it?
Predicting Station-level Hourly Demands in a Large-scale Bike-sharing Network: A Graph Convolutional Neural Network Approach
Lin, Lei, He, Zhengbing, Peeta, Srinivas, Wen, Xuejin
Bike sharing is a vital piece in a modern multi-modal transportation system. However, it suffers from the bike unbalancing problem due to fluctuating spatial and temporal demands. Accurate bike sharing demand predictions can help operators to make optimal routes and schedules for bike redistributions, and therefore enhance the system efficiency. In this study, we propose a novel Graph Convolutional Neural Network with Data-driven Graph Filter (GCNN-DDGF) model to predict station-level hourly demands in a large-scale bike-sharing network. With each station as a vertex in the network, the new proposed GCNN-DDGF model is able to automatically learn the hidden correlations between stations, and thus overcomes a common issue reported in the previous studies, i.e., the quality and performance of GCNN models rely on the predefinition of the adjacency matrix. To show the performance of the proposed model, this study compares the GCNN-DDGF model with four GCNNs models, whose adjacency matrices are from different bike sharing system matrices including the Spatial Distance matrix (SD), the Demand matrix (DE), the Average Trip Duration matrix (ATD) and the Demand Correlation matrix (DC), respectively. The five types of GCNN models and the classic Support Vector Regression model are built on a Citi Bike dataset from New York City which includes 272 stations and over 28 million transactions from 2013 to 2016. Results show that the GCNN-DDGF model has the lowest Root Mean Square Error, followed by the GCNN-DC model, and the GCNN-ATD model has the worst performance. Through a further examination, we find the learned DDGF captures some similar information embedded in the SD, DE and DC matrices, and it also uncovers more hidden heterogeneous pairwise correlations between stations that are not revealed by any of those matrices.
Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies
Hou, Elizabeth, Sricharan, Kumar, Hero, Alfred O.
Anomaly detection is a very pervasive problem applicable to a variety of domains including network intrusion, fraud detection, and system failures. It is a crucial task in many applications because failure to detect anomalous activity could result in highly undesirable outcomes. For example, (i) detection of anomalous medical claims is important to identify fraud; (ii) detection of fraudulent credit card transactions is necessary to help prevent identity theft; and (iii) detection of abnormal network traffic is necessary to identify hacking. Many techniques have been developed for anomaly detection. These methods can be broadly classified into two categories: (i) rule-based systems, and (ii) statistical datadriven approaches. The rule-based systems are based on domain expertise and look for specific types of anomalies while the data-driven approaches look to identify anomalies by identifying statistically rare patterns. Examples of datadriven methods include parametric methods that assume a known family for the nominal (non-anomalous) distribution and nonparametric methods such as those using unsupervised or semi-supervised support vector machines (SVMs) [1], [2] or based on minimum volume set estimation [3], [4], [5]. The advantage of data-driven approaches over rule-based methods is that they can identify novel types of anomalies that are unknown to the domain expert.
An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification
Convolutional neural networks (CNNs) are similar to "ordinary" neural networks in the sense that they are made up of hidden layers consisting of neurons with "learnable" parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.