Goto

Collaborating Authors

 Support Vector Machines


Statistics and Machine Learning Toolbox - MATLAB & Simulink

#artificialintelligence

Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data. You can use descriptive statistics and plots for exploratory data analysis, fit probability distributions to data, generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models. For multidimensional data analysis, Statistics and Machine Learning Toolbox provides feature selection, stepwise regression, principal component analysis (PCA), regularization, and other dimensionality reduction methods that let you identify variables or features that impact your model. The toolbox provides supervised and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbor, k-means, k-medoids, hierarchical clustering, Gaussian mixture models, and hidden Markov models.


5 Machine Learning Research Studies To Understand & Predict Length of Stay in Hospitals

@machinelearnbot

Length of Stay (LOS) is a critical factor in managing hospital quality & economic outcomes in Healthcare. The metric is calculated by summing the total number of days for all discharges & dividing it by the total number of discharges. Insurance programs such as Medicare are moving to a model where they are compensating Hospitals the same amount for a specific surgery (e.g. Joint replacement) regardless of the number of days spent in the hospital. Therefore, hospitals & the overall healthcare ecosystem are motivated to reduce LOS.


Machine Learning Algorithms: A Concise Technical Overview

#artificialintelligence

Whether you are a newcomer to machine learning, a newbie to specific algorithms or concepts, or a seasoned ML vet looking for a once-over of an algorithm you haven't seen or used in a while, these short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept. Support Vector Machines remain a popular and time-tested classification algorithm. This post provides a high-level concise technical overview of their functionality. A wide array of clustering techniques are in use today.


A Kaggler's Guide to Model Stacking in Practice

#artificialintelligence

Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. For this reason, stacking is most effective when the base models are significantly different. Here I provide a simple example and guide on how stacking is most often implemented in practice. Feel free to follow this article using the related code and datasets here in the Machine Learning Problem Bible.


"What is Relevant in a Text Document?": An Interpretable Machine Learning Approach

arXiv.org Machine Learning

Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.


Microsoft R Server 9.0 now available

#artificialintelligence

Microsoft R Server 9.0, Microsoft's R distribution with added big-data, in-database, and integration capabilities, was released today and is now available for download to MSDN subscribers. This latest release is built on Microsoft R Open 3.3.2, This release includes a brand new R package for machine learning: MicrosoftML. This package provides state-of-the-art, fast and scalable machine learning algorithms for common data science tasks including featurization, classification and regression. Fast linear and logistic model functions based on the Stochastic Dual Coordinate Ascent method; Fast Forests, a random forest and quantile regression forest implementation based on FastRank, an efficient implementation of the MART gradient boosting algorithm; A neural network algorithm with support for custom, multilayer network topologies and GPU acceleration; One-class anomaly detection based on support vector machines. One-class anomaly detection based on support vector machines.


Support Vector Machines: A Simple Explanation

#artificialintelligence

In this post, we are going to introduce you to the Support Vector Machine (SVM) machine learning algorithm. We will follow a similar process to our recent post Naive Bayes for Dummies; A Simple Explanation by keeping it short and not overly-technical. The aim is to give those of you who are new to machine learning a basic understanding of the key concepts of this algorithm. Support Vector Machines - What are they? A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below.


Support Vector Machines for dummies; A Simple Explanation - AYLIEN

#artificialintelligence

In this post, we are going to introduce you to the Support Vector Machine (SVM) machine learning algorithm. We will follow a similar process to our recent post Naive Bayes for Dummies; A Simple Explanation by keeping it short and not overly-technical. The aim is to give those of you who are new to machine learning a basic understanding of the key concepts of this algorithm. Support Vector Machines – What are they? A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below.


SCOPE: Scalable Composite Optimization for Learning on Spark

arXiv.org Machine Learning

Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods are not scalable enough. In this paper, we propose a novel DSO method, called \underline{s}calable \underline{c}omposite \underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.


The Importance of Location in Real Estate, Weather, and Machine Learning

#artificialintelligence

Real estate experts like to say that the three most important features of a property are: location, location, location! Likewise, weather events are highly location-dependent. We will see below how a similar perspective is also applicable to machine learning algorithms. In real estate, the buyer is first and foremost concerned about location for at least 3 reasons: (a) the desirability of the surrounding neighborhood; (b) the proximity to schools, businesses, services, etc.; and (c) the value of properties in that area. Similarly, meteorologists tell us that all weather is local.