Goto

Collaborating Authors

 Support Vector Machines


Understanding Support Vector Machine algorithm from examples (along with code)

#artificialintelligence

Most of the beginners start by learning regression. It is simple to learn and use, but does that solve our purpose? Because, you can do so much more than just Regression! Think of machine learning algorithms as an armory packed with axes, sword, blades, bow, dagger etc. You have various tools, but you ought to learn to use them at the right time.


"The Five Tribes of Machine Learning (And What You Can Learn from Each)," Pedro Domingos

#artificialintelligence

There are five main schools of thought in machine learning, and each has its own master algorithm โ€“ a general-purpose learner that can in principle be applied to any domain. The symbolists have inverse deduction, the connectionists have backpropagation, the evolutionaries have genetic programming, the Bayesians have probabilistic inference, and the analogizers have support vector machines. What we really need, however, is a single algorithm combining the key features of all of them. In this webinar I will summarize the five paradigms and describe my work toward unifying them, including in particular Markov logic networks. I will conclude by speculating on the new applications that a universal learner will enable, and how society will change as a result.


Support Vector Machines: A Guide for Beginners - QuantStart

#artificialintelligence

Check out my ebook on quant trading where I teach you how to build profitable systematic trading strategies with Python tools, from scratch. Take a look at my new ebook on advanced trading strategies using time series analysis, machine learning and Bayesian statistics, with Python and R. Hi! My name is Mike and I'm the guy behind QuantStart.com. I used to work in a hedge fund as a quantitative trading developer in London. Now I research, develop, backtest and implement my own intraday algorithmic trading strategies using C and Python. Quantocracy is one of the leading quant link aggregator sites. In this guide I want to introduce you to an extremely powerful machine learning technique known as the Support Vector Machine (SVM). It is one of the best "out of the box" supervised classification techniques.


Three Things About Data Science You Won't Find In the Books

#artificialintelligence

In case you haven't heard yet, Data Science is all the craze. Courses, posts, and schools are springing up everywhere. However, every time I take a look at one of those offerings, I see that a lot of emphasis is put on specific learning algorithms. Of course, understanding how logistic regression or deep learning works is cool, but once you start working with data, you find out that there are other things equally important, or maybe even more. I can't really blame these courses.


Logistic Regression vs Decision Trees vs SVM: Part II

@machinelearnbot

This is the 2nd part of the series. In this part we'll discuss how to choose between Logistic Regression, Decision Trees and Support Vector Machines. The most correct answer as mentioned in the first part of this 2 part article, still remains it depends. We'll continue our effort to shed some light on, it depends on what. All three of these techniques have certain properties inherent by their design, we'll elaborate on some in order to provide you with few pointers on their selection for your particular business problem.


Implementing your own k-nearest neighbour algorithm using Python

#artificialintelligence

In machine learning, you may often wish to build predictors that allows to classify things into categories based on some set of associated values. For example, it is possible to provide a diagnosis to a patient based on data from previous patients. Many algorithms have been developed for automated classification, and common ones include random forests, support vector machines, Naรฏve Bayes classifiers, and many types of neural networks. To get a feel for how classification works, we take a simple example of a classification algorithm โ€“ k-Nearest Neighbours (kNN) โ€“ and build it from scratch in Python 2. You can use a mostly imperative style of coding, rather than a declarative/functional one with lambda functions and list comprehensions to keep things simple if you are starting with Python. Here, we will provide an introduction to the latter approach.


Improve SVM Tuning through Parallelism

#artificialintelligence

As pointed out in the chapter 10 of "The Elements of Statistical Learning", ANN and SVM (support vector machines) share similar pros and cons, e.g. However, in contrast to ANN usually suffering from local minima solutions, SVM is always able to converge globally. In addition, SVM is less prone to over-fitting given a good choice of free parameters, which usually can be identified through cross-validations. In the R package "e1071", tune() function can be used to search for SVM parameters but is extremely inefficient due to the sequential instead of parallel executions. In the code snippet below, a parallelism-based algorithm performs the grid search for SVM parameters through the K-fold cross validation.


Machine Learning in Parallel with Support Vector Machines, Generalized Linear Models, and Adaptive Boosting

@machinelearnbot

This article describes methods for machine learning using bootstrap samples and parallel processing to model very large volumes of data in short periods of time. The R programming language includes many packages for machine learning different types of data. Three of these packages include Support Vector Machines (SVM) [1], Generalized Linear Models (GLM) [2], and Adaptive Boosting (AdaBoost) [3]. While all three packages can be highly accurate for various types of classification problems, each package performs very differently when modeling (i.e. In particular, model fitting for Generalized Linear Models execute in much shorter periods of time than either Support Vector Machines or Adaptive Boosting.


A Comparison Study of Nonlinear Kernels

arXiv.org Machine Learning

In this paper, we compare 5 different nonlinear kernels: min-max, RBF, fRBF (folded RBF), acos, and acos-$\chi^2$, on a wide range of publicly available datasets. The proposed fRBF kernel performs very similarly to the RBF kernel. Both RBF and fRBF kernels require an important tuning parameter ($\gamma$). Interestingly, for a significant portion of the datasets, the min-max kernel outperforms the best-tuned RBF/fRBF kernels. The acos kernel and acos-$\chi^2$ kernel also perform well in general and in some datasets achieve the best accuracies. One crucial issue with the use of nonlinear kernels is the excessive computational and memory cost. These days, one increasingly popular strategy is to linearize the kernels through various randomization algorithms. In our study, the randomization method for the min-max kernel demonstrates excellent performance compared to the randomization methods for other types of nonlinear kernels, measured in terms of the number of nonzero terms in the transformed dataset. Our study provides evidence for supporting the use of the min-max kernel and the corresponding randomized linearization method (i.e., the so-called "0-bit CWS"). Furthermore, the results motivate at least two directions for future research: (i) To develop new (and linearizable) nonlinear kernels for better accuracies; and (ii) To develop better linearization algorithms for improving the current linearization methods for the RBF kernel, the acos kernel, and the acos-$\chi^2$ kernel. One attempt is to combine the min-max kernel with the acos kernel or the acos-$\chi^2$ kernel. The advantages of these two new and tuning-free nonlinear kernels are demonstrated vias our extensive experiments.


Feature Selection with Annealing for Computer Vision and Big Data Learning

arXiv.org Machine Learning

Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps dropping throughout the iterations makes it particularly suitable for big data learning. Our approach applies generically to the optimization of any differentiable loss function, and finds applications in regression, classification and ranking. The resultant algorithms build variable screening into estimation and are extremely simple to implement. We provide theoretical guarantees of convergence and selection consistency. In addition, one dimensional piecewise linear response functions are used to account for nonlinearity and a second order prior is imposed on these functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method compares very well with other state of the art methods in regression, classification and ranking while being computationally very efficient and scalable.