Goto

Collaborating Authors

 Support Vector Machines


An introduction to Support Vector Machines (SVM) MonkeyLearn Blog

@machinelearnbot

You're refining your training set, and maybe you've even tried stuff out using Naive Bayes. But now you're feeling confident in your dataset, and want to take it one step further. Enter Support Vector Machines (SVM): a fast and dependable classification algorithm that performs very well with a limited amount of data. Perhaps you have dug a bit deeper, and ran into terms like linearly separable, kernel trick and kernel functions. The idea behind the SVM algorithm is simple, and applying it to natural language classification doesn't require most of the complicated stuff.


Modified Frank-Wolfe Algorithm for Enhanced Sparsity in Support Vector Machine Classifiers

arXiv.org Machine Learning

Regularization is an essential mechanism in Machine Learning that usually refers to the set of techniques that attempt to improve the estimates by biasing them away from their samplebased values towards values that are deemed to be more "physically plausible" [1]. In practice, it is often used to avoid overfitting, use some prior knowledge about the problem at hand or induce some desirable properties over the resulting learning machine. One of these properties is the so called sparsity, which can be roughly defined as expressing the learning machines using only a part of the training information. This has advantages in terms of the interpretability of the model and its manageability, and also preventing the over-fitting. Two representatives of this type of models are the Support Vector Machines (SVM [2]) and the Lasso model [3], based on inducing sparsity at two different levels. On the one hand, the SVMs are sparse in their representation in terms of the training patterns, which means that the model is characterized only by a subsample of the original training dataset. On the other hand, the Lasso models induce sparsity at the level of the features, in the sense that the model is defined only as a function of a subset of the inputs, hence performing an implicit feature selection.


Rgtsvm: Support Vector Machines on a GPU in R

arXiv.org Machine Learning

Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. Nevertheless, Rgtsvm retains feature parity and has an interface that is compatible with the popular e1071 SVM package in R. Altogether, Rgtsvm enables large SVM models to be created by both experienced and novice practitioners.


Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

#artificialintelligence

Broiler farmers have used data as an aid to health and production management for over 40 years [1,2]. Food and water consumption, growth and mortality have been used to construct standard production curves to monitor and improve performance. Daily flock data are plotted graphically on broiler house'door charts' and deviations used as early indicators of flock health and welfare [3]. Increasingly, these and other sensor-recorded data are being collected electronically, giving birth to the concept of precision livestock farming [4]. Broiler flocks generate large datasets.


indrajithi/mgc-django

@machinelearnbot

Music is categorized into subjective categories called genres. With the growth of the internet and multimedia systems applications that deal with the musical databases gained importance and demand for Music Information Retrieval (MIR) applications increased. Musical genres have no strict definitions and boundaries as they arise through a complex interaction between the public, marketing, historical, and cultural factors. This is Web Application that Classify Music in to genres. Our web application is written in Python using Django framework.


Predictive modelling of training loads and injury in Australian football

arXiv.org Machine Learning

To investigate whether training load monitoring data could be used to predict injuries in elite Australian football players, data were collected from elite athletes over 3 seasons at an Australian football club. Loads were quantified using GPS devices, accelerometers and player perceived exertion ratings. Absolute and relative training load metrics were calculated for each player each day (rolling average, exponentially weighted moving average, acute:chronic workload ratio, monotony and strain). Injury prediction models (regularised logistic regression, generalised estimating equations, random forests and support vector machines) were built for non-contact, non-contact time-loss and hamstring specific injuries using the first two seasons of data. Injury predictions were generated for the third season and evaluated using the area under the receiver operator characteristic (AUC). Predictive performance was only marginally better than chance for models of non-contact and non-contact time-loss injuries (AUC$<$0.65). The best performing model was a multivariate logistic regression for hamstring injuries (best AUC=0.76). Learning curves suggested logistic regression was underfitting the load-injury relationship and that using a more complex model or increasing the amount of model building data may lead to future improvements. Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting they are limited as a daily decision tool for practitioners. Focusing the modelling approach on specific injury types and increasing the amount of training data may lead to the development of improved predictive models for injury prevention.


Recursive Multikernel Filters Exploiting Nonlinear Temporal Structure

arXiv.org Machine Learning

In kernel methods, temporal information on the data is commonly included by using time-delayed embeddings as inputs. Recently, an alternative formulation was proposed by defining a gamma-filter explicitly in a reproducing kernel Hilbert space, giving rise to a complex model where multiple kernels operate on different temporal combinations of the input signal. In the original formulation, the kernels are then simply combined to obtain a single kernel matrix (for instance by averaging), which provides computational benefits but discards important information on the temporal structure of the signal. Inspired by works on multiple kernel learning, we overcome this drawback by considering the different kernels separately. We propose an efficient strategy to adaptively combine and select these kernels during the training phase. The resulting batch and online algorithms automatically learn to process highly nonlinear temporal information extracted from the input signal, which is implicitly encoded in the kernel values. We evaluate our proposal on several artificial and real tasks, showing that it can outperform classical approaches both in batch and online settings.


Top 10 Amazon Books in Artificial Intelligence & Machine Learning, 2016 Edition

@machinelearnbot

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more.


A budget-constrained inverse classification framework for smooth classifiers

arXiv.org Machine Learning

Inverse classification is the process of manipulating an instance such that it is more likely to conform to a specific class. Past methods that address such a problem have shortcomings. Greedy methods make changes that are overly radical, often relying on data that is strictly discrete. Other methods rely on certain data points, the presence of which cannot be guaranteed. In this paper we propose a general framework and method that overcomes these and other limitations. The formulation of our method can use any differentiable classification function. We demonstrate the method by using logistic regression and Gaussian kernel SVMs. We constrain the inverse classification to occur on features that can actually be changed, each of which incurs an individual cost. We further subject such changes to fall within a certain level of cumulative change (budget). Our framework can also accommodate the estimation of (indirectly changeable) features whose values change as a consequence of actions taken. Furthermore, we propose two methods for specifying feature-value ranges that result in different algorithmic behavior. We apply our method, and a proposed sensitivity analysis-based benchmark method, to two freely available datasets: Student Performance from the UCI Machine Learning Repository and a real world cardiovascular disease dataset. The results obtained demonstrate the validity and benefits of our framework and method.


Support Vector Machines: A Simple Explanation

@machinelearnbot

In this post, we are going to introduce you to the Support Vector Machine (SVM) machine learning algorithm. We will follow a similar process to our recent post Naive Bayes for Dummies; A Simple Explanation by keeping it short and not overly-technical. The aim is to give those of you who are new to machine learning a basic understanding of the key concepts of this algorithm. A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post.