Goto

Collaborating Authors

 Support Vector Machines


Discriminating sample groups with multi-way data

arXiv.org Machine Learning

High-dimensional linear classifiers, such as the support vector machine (SVM) and distance weighted discrimination (DWD), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses metabolite magnetic resonance spectroscopy data over multiple brain regions to compare patients with and without spinocerebellar ataxia, the second uses publicly available gene expression time-course data to compare treatment responses for patients with multiple sclerosis. Our method improves performance and simplifies interpretation over naive applications of full rank linear classification to multi-way data. An R package is available at https://github.com/lockEF/MultiwayClassification .


Support Vector Machines for dummies; A Simple Explanation

#artificialintelligence

In this post, we are going to introduce you to the Support Vector Machine (SVM) machine learning algorithm. We will follow a similar process to our recent post Naive Bayes for Dummies; A Simple Explanation by keeping it short and not overly-technical. The aim is to give those of you who are new to machine learning a basic understanding of the key concepts of this algorithm. A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post.


District Data Labs - Visual Diagnostics for More Informed Machine Learning: Part 2

#artificialintelligence

Note: Before starting Part 2, be sure to read Part 1! When it comes to machine learning, ultimately the most important picture to have is the big picture. Whether it's logistic regression, random forests, Bayesian methods, support vector machines, or neural nets, everyone seems to have their favorite! Unfortunately these discussions tend to truncate the challenges of machine learning into a single problem, which is a particularly problematic misrepresentation for people who are just getting started with machine learning. Sure, picking a good model is important, but it's certainly not enough (and it's debatable whether a model can actually be'good' devoid of the context of the domain, the hypothesis, the shape of the data, and the intended application. In this post we'll discuss model selection in the context of the big picture, which I'll present in terms of the model selection triple, and we'll explore a set of visual tools for navigating the triple.


Machine Learning in Manufacturing – Using Artificial Intelligence to Optimize Processes

#artificialintelligence

As the manufacturing industry is moving away from the traditional long term service contract to an'Analytics-as-a-Service' model, big data applications are increasingly being used to collect data from manufacturing operations. Using big data, you can accurately predict failure in operations well ahead of time, increasing the service revenue and reducing the cost of service. In addition, you can predict the health of your equipment in real time, and release equipment for maintenance only when necessary. Through the use of neural networks, support vector machines, and decision trees, you can identify complex interdependencies within operational parameters and detect anomalies that lead to equipment failures.


What are the Best Machine Learning Packages in R? R-bloggers

#artificialintelligence

The most common question asked by prospective data scientists is – "What is the best programming language for Machine Learning?" The answer to this question always results in a debate whether to choose R, Python or MATLAB for Machine Learning. Nobody can, in reality, answer the question as to whether Python or R is best language for Machine Learning. However, the programming language one should choose for machine learning directly depends on the requirements of a given data problem, the likes and preferences of the data scientist and the context of machine learning activities they want to perform. According to a survey on Kaggler's Favourite Tools, the open source R programming language turned out to be the favourite among 543 Kagglers of the 1714 Kaggler's listing their data science tools.


How to Select Support Vector Machine Kernels

#artificialintelligence

Given an arbitrary dataset, you typically don't know which kernel may work best. I recommend starting with the simplest hypothesis space first -- given that you don't know much about your data -- and work your way up towards the more complex hypothesis spaces. So, the linear kernel works fine if your dataset if linearly separable; however, if your dataset isn't linearly separable, a linear kernel isn't going to cut it (almost in a literal sense;)). For simplicity (and visualization purposes), let's assume our dataset consists of 2 dimensions only. Now, it looks like both linear and RBF kernel SVM would work equally well on this dataset.


Machine Learning vs Predictive Modeling

#artificialintelligence

The above question seems to haunt most people who have been doing statistical predictive modeling before the term machine learning came into play. Nowadays it seems whoever have run a classification problem with any of the advanced algorithms like Neural Network, Support Vector Machine, etc. calls themselves a machine learning expert. But is this machine learning? We have this statistical/mathematical models from the early 60's . And the only reason not all of them was popular back then is because it was too advanced for the computing power available at those times.


Logistic Regression Vs Decision Trees Vs SVM: Part I - Edvancer Eduventures

#artificialintelligence

Classification is one of the major problems that we solve while working on standard business problems across industries. In this article we'll be discussing the major three of the many techniques used for the same, Logistic Regression, Decision Trees and Support Vector Machines [SVM]. All of the above listed algorithms are used in classification [ SVM and Decision Trees are also used for regression, but we are not discussing that today!]. Time and again I have seen people asking which one to choose for their particular problem. Classical and the most correct but least satisfying response to that question is "it depends!".


What are the Best Machine Learning Packages in R? R-bloggers

#artificialintelligence

The most common question asked by prospective data scientists is – "What is the best programming language for Machine Learning?" The answer to this question always results in a debate whether to choose R, Python or MATLAB for Machine Learning. Nobody can, in reality, answer the question as to whether Python or R is best language for Machine Learning. However, the programming language one should choose for machine learning directly depends on the requirements of a given data problem, the likes and preferences of the data scientist and the context of machine learning activities they want to perform. According to a survey on Kaggler's Favourite Tools, the open source R programming language turned out to be the favourite among 543 Kagglers of the 1714 Kaggler's listing their data science tools.


Wrapping up Python into a Cloud-based PostgreSQL

#artificialintelligence

Specifically, using PL/Python, one can bring in countless Python libraries to process data close to the database. Here I will talk about my efforts to bring in the functionality of PySAL, a spatial analytics library written in Python and developed largely by Serge Rey, et al. at Arizona State University. PySAL makes available robust exploratory spatial data analysis related to spatial cluster and outlier detection, hotspot detection, spatial regression, and much more. Besides the wrappers we wrote for PySAL, we have written classes for bringing in machine learning methods such as random forest, linear regression, support vector machines, and neural networks from scikit-learn and Tensorflow. This talk will specifically cover the challenges we encountered programming in the PL/Python environment, collaborations with some of the PySAL developers, and the power of having spatial statistics and machine learning capabilities baked right into a cloud database.