Support Vector Machines
Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation
Aksakalli, Vural, Malekipirbazari, Milad
Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary simultaneous perturbation stochastic approximation (BSPSA). This pseudo-gradient descent stochastic algorithm starts with an initial feature vector and moves toward the optimal feature vector via successive iterations. In each iteration, the current feature vector's individual components are perturbed simultaneously by random offsets from a qualified probability distribution. We present computational experiments on datasets with numbers of features ranging from a few dozens to thousands using three widely-used classifiers as wrappers: nearest neighbor, decision tree, and linear support vector machine. We compare our methodology against the full set of features as well as a binary genetic algorithm and sequential FS methods using cross-validated classification error rate and AUC as the performance criteria. Our results indicate that features selected by BSPSA compare favorably to alternative methods in general and BSPSA can yield superior feature sets for datasets with tens of thousands of features by examining an extremely small fraction of the solution space. We are not aware of any other wrapper FS methods that are computationally feasible with good convergence properties for such large datasets.
Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines
Houthooft, Rein, De Turck, Filip
Tackling pattern recognition problems in areas such as computer vision, bioinformatics, speech or text recognition is often done best by taking into account task-specific statistical relations between output variables. In structured prediction, this internal structure is used to predict multiple outputs simultaneously, leading to more accurate and coherent predictions. Structural support vector machines (SSVMs) are nonprobabilistic models that optimize a joint input-output function through margin-based learning. Because SSVMs generally disregard the interplay between unary and interaction factors during the training phase, final parameters are suboptimal. Moreover, its factors are often restricted to linear combinations of input features, limiting its generalization power. To improve prediction accuracy, this paper proposes: (i) Joint inference and learning by integration of back-propagation and loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM factors to neural networks that form highly nonlinear functions of input features. Image segmentation benchmark results demonstrate improvements over conventional SSVM training methods in terms of accuracy, highlighting the feasibility of end-to-end SSVM training with neural factors.
Maximum margin classifier working in a set of strings
Koyano, Hitoshi, Hayashida, Morihiro, Akutsu, Tatsuya
Numbers and numerical vectors account for a large portion of data. However, recently the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem on the asymptotic behavior of a consensus sequence of strings, which is the counterpart of the mean of numerical vectors, as demonstrated in the probability theory on a metric space of strings developed by one of the authors and his colleague in a previous study [18]. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein--protein interactions using amino acid sequences.
Learning to classify with possible sensor failures
Xie, Tianpei, Nasrabadi, Nasser M., Hero, Alfred O.
Large margin classifiers, such as the support vector machine (SVM) [1] and the maximum entropy discrimination (MED) classifier [2], have enjoyed great popularity in the signal processing and machine learning communities due to their broad applicability, robust performance, and the availability of fast software implementations. When the training data is representative of the test data, the performance of MED/SVM has theoretical guarantees that have been validated in practice [1], [3], [4]. Moreover, since the decision boundary of the MED/SVM is solely defined by a few support vectors, the algorithm can tolerate random feature distortions and perturbations. However, in many real applications, anomalous measurements are inherent to the data set due to strong environmental noise or possible sensor failures. Such anomalies arise in industrial process monitoring, video surveillance, tactical multi-modal sensing, and, more generally, any application that involves unattended sensors in difficult environments (Figure 1).
Distributed Training of Structured SVM
Lee, Ching-pei, Chang, Kai-Wei, Upadhyay, Shyam, Roth, Dan
Training structured prediction models is time-consuming. However, most existing approaches only use a single machine, thus, the advantage of computing power and the capacity for larger data sets of multiple machines have not been exploited. In this work, we propose an efficient algorithm for distributedly training structured support vector machines based on a distributed block-coordinate descent method. Both theoretical and experimental results indicate that our method is efficient.
General Vector Machine
The support vector machine (SVM) is an important class of learning machines for function approach, pattern recognition, and time-serious prediction, etc. It maps samples into the feature space by so-called support vectors of selected samples, and then feature vectors are separated by maximum margin hyperplane. The present paper presents the general vector machine (GVM) to replace the SVM. The support vectors are replaced by general project vectors selected from the usual vector space, and a Monte Carlo (MC) algorithm is developed to find the general vectors. The general project vectors improves the feature-extraction ability, and the MC algorithm can control the width of the separation margin of the hyperplane. By controlling the separation margin, we show that the maximum margin hyperplane can usually induce the overlearning, and the best learning machine is achieved with a proper separation margin. Applications in function approach, pattern recognition, and classification indicate that the developed method is very successful, particularly for small-set training problems. Additionally, our algorithm may induce some particular applications, such as for the transductive inference.
Deploying nEmesis: Preventing Foodborne Illness by Data Mining Social Media
Sadilek, Adam (University of Rochester) | Kautz, Henry (University of Rochester) | DiPrete, Lauren (Southern Nevada Health District, Las Vegas, Nevada) | Labus, Brian (Southern Nevada Health District, Las Vegas, Nevada) | Portman, Eric (University of Rochester) | Teitel, Jack (University of Rochester) | Silenzio, Vincent (University of Rochester)
Foodborne illness afflicts 48 million people annually in the U.S.alone. Over 128,000 are hospitalized and 3,000 die from the infection.While preventable with proper food safety practices, the traditional restaurant inspection process has limited impact given the predictability and low frequency of inspections, and the dynamic nature of the kitchen environment. Despite this reality, the inspection process has remained largely unchanged for decades. We apply machine learning to Twitter data and develop a system that automatically detects venues likely to pose a public health hazard.Health professionals subsequently inspect individual flagged venues in a double blind experiment spanning the entire Las Vegas metropolitan area over three months. By contrast, previous research in this domain has been limited to indirect correlative validation using only aggregate statistics. We show that adaptive inspection process is 63% more effective at identifying problematic venues than the current state of the art. The live deployment shows that if every inspection in Las Vegas became adaptive, we can prevent over 9,000 cases of foodborne illness and 557 hospitalizations annually. Additionally,adaptive inspections result in unexpected benefits, including the identification of venues lacking permits, contagious kitchen staff,and fewer customer complaints filed with the Las Vegas health department.
Fast model selection by limiting SVM training times
Demircioglu, Aydin, Horn, Daniel, Glasmachers, Tobias, Bischl, Bernd, Weihs, Claus
Kernelized Support Vector Machines (SVMs) are among the best performing supervised learning methods. But for optimal predictive performance, time-consuming parameter tuning is crucial, which impedes application. To tackle this problem, the classic model selection procedure based on grid-search and cross-validation was refined, e.g. by data subsampling and direct search heuristics. Here we focus on a different aspect, the stopping criterion for SVM training. We show that by limiting the training time given to the SVM solver during parameter tuning we can reduce model selection times by an order of magnitude.
Machine Learning Model of the Swift/BAT Trigger Algorithm for Long GRB Population Studies
Graff, Philip B, Lien, Amy Y, Baker, John G, Sakamoto, Takanori
To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift/BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien 2014 is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of $\gtrsim97\%$ ($\lesssim 3\%$ error), which is a significant improvement on a cut in GRB flux which has an accuracy of $89.6\%$ ($10.4\%$ error). These models are then used to measure the detection efficiency of Swift as a function of redshift $z$, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of $n_0 \sim 0.48^{+0.41}_{-0.23} \ {\rm Gpc}^{-3} {\rm yr}^{-1}$ with power-law indices of $n_1 \sim 1.7^{+0.6}_{-0.5}$ and $n_2 \sim -5.9^{+5.7}_{-0.1}$ for GRBs above and below a break point of $z_1 \sim 6.8^{+2.8}_{-3.2}$. This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting. The code used in this is analysis is publicly available online (https://github.com/PBGraff/SwiftGRB_PEanalysis).
Comparative evaluation of state-of-the-art algorithms for SSVEP-based BCIs
Oikonomou, Vangelis P., Liaros, Georgios, Georgiadis, Kostantinos, Chatzilari, Elisavet, Adam, Katerina, Nikolopoulos, Spiros, Kompatsiaris, Ioannis
Brain-computer interfaces (BCIs) have been gaining momentum in making human-computer interaction more natural, especially for people with neuro-muscular disabilities. Among the existing solutions the systems relying on electroencephalograms (EEG) occupy the most prominent place due to their non-invasiveness. However, the process of translating EEG signals into computer commands is far from trivial, since it requires the optimization of many different parameters that need to be tuned jointly. In this report, we focus on the category of EEG-based BCIs that rely on Steady-State-Visual-Evoked Potentials (SSVEPs) and perform a comparative evaluation of the most promising algorithms existing in the literature. More specifically, we define a set of algorithms for each of the various different parameters composing a BCI system (i.e. filtering, artifact removal, feature extraction, feature selection and classification) and study each parameter independently by keeping all other parameters fixed. The results obtained from this evaluation process are provided together with a dataset consisting of the 256-channel, EEG signals of 11 subjects, as well as a processing toolbox for reproducing the results and supporting further experimentation. In this way, we manage to make available for the community a state-of-the-art baseline for SSVEP-based BCIs that can be used as a basis for introducing novel methods and approaches.