Goto

Collaborating Authors

 Decision Tree Learning


Data Science Primer: Basic Concepts for Beginners

@machinelearnbot

This post will provide an overview of bagging, boosting, and stacking, arguably the most used and well-known of the basic ensemble methods. They are not, however, the only options. Random Forests is another example of an ensemble learner, which uses numerous decision trees in a single predictive model, and which is often overlooked and treated as a "regular" algorithm. There are other approaches to selecting effective algorithms as well, treated below.


Churn Prediction With Apache Spark Machine Learning - DZone AI

#artificialintelligence

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals. The prediction process is heavily data-driven and often utilizes advanced machine learning techniques. In this post, we'll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models -- all with Spark and its machine learning frameworks.


Canonical Correlation Forests

arXiv.org Machine Learning

We introduce canonical correlation forests (CCFs), a new decision tree ensemble method for classification and regression. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Unlike axis-aligned alternatives, the decision surfaces of CCFs are not restricted to the coordinate system of the inputs features and therefore more naturally represent data with correlated inputs. CCFs naturally accommodate multiple outputs, provide a similar computational complexity to random forests, and inherit their impressive robustness to the choice of input parameters. As part of the CCF training algorithm, we also introduce projection bootstrapping, a novel alternative to bagging for oblique decision tree ensembles which maintains use of the full dataset in selecting split points, often leading to improvements in predictive accuracy. Our experiments show that, even without parameter tuning, CCFs out-perform axis-aligned random forests and other state-of-the-art tree ensemble methods on both classification and regression problems, delivering both improved predictive accuracy and faster training times. We further show that they outperform all of the 179 classifiers considered in a recent extensive survey.


When Does Deep Learning Work Better Than SVMs or Random Forests?

@machinelearnbot

Guest blog by Sebastian Raschka, originally posted here. If we tackle a supervised learning problem, my advice is to start with the simplest hypothesis space first. I.e., try a linear model such as logistic regression. If this doesn't work "well" (i.e., it doesn't meet our expectation or performance criterion that we defined earlier), I would move on to the next experiment. I would say that random forests are probably THE "worry-free" approach - if such a thing exists in ML: There are no real hyperparameters to tune (maybe except for the number of trees; typically, the more trees we have the better).


Price Optimisation Using Decision Tree (Regression Tree) - Machine Learning

@machinelearnbot

The research was conducted to find out what price maximises profit without sacrificing the high demand for the product due to the price being too high nor sacrificing the margins on the product due to the price being too low. The goal is to experiment with different price levels for the same product in one market place and country to see how sales volumes change with prices and which volume level of products we can be sold for that optimal price range. As a data scientist it is my responsibility to identify the optimum prices of products so the items can be sold for maximum profit. Sales managers and small business owners are faced with the decision of at what price to sell each of their products in each marketplace or country in order to be able to maximize profit. With each line of product being added and a lot of products to monitor, it is very difficult to determine the optimum price for each product.


Decision tree vs. linearly separable or non-separable pattern

@machinelearnbot

As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns. Its decision boundary was drawn almost perfectly parallel to the assumed true boundary, i.e. Awful result, it appears to never follow the true boundary. Just a little improved, but it still appears to be overfitted. Even worse... it appears to get more overfitted than the case of 2-classes.


Regression Trees - What is the best reference?

@machinelearnbot

If you're interested in the full article then please let me know and I might be able to send you the full version] 2) Kohavi, R. and Quinlan, R., (1999).


Detrmining the BEST decision tree.

@machinelearnbot

A couple quick points: --------- Do a quick google search on "Gains table", "Gains chart", and "lift chart" and you'll find some good info about comparing how good various models are. Independent of traditional measures of model performance (which typically look at performance across the full dataset), it's also possible that models that may not be ideal for some purposes, might still reveal some important findings or insights. E.g., a tree model might not do a great job overall, but it might identify a fraction of the data (a small terminal node) that has a very high percentage of targets. Depending on your domain, this small terminal node could be valuable (e.g., everyone in that group is likely to be committing tax fraud, or are likely to have cancer, etc.) --------- Also, just seeing which variables are important for the prediction can have value.


R-squared for Decision Tree

@machinelearnbot

I use the methodology you speak of all the time. I was the original programer for Breiman and Stone's version of CART in the late 70's which is where I believe I was first introduced to that method. However we were very careful to use the term variation explained since there is little relationship to the theoretical Pearson "r". Be aware that this value can go negative. Which implies that parts of your model behave a lot higher variation then the population variance.


Decision Trees, Classification & Interpretation Using SciKit-Learn

@machinelearnbot

This article is by Jitesh Shah, a data & stats jockey in perpetual beta, located in Fremont, California. This article includes the data set and Python code. Wouldn't it be nice if defects and product failures can be predicted in advance. We've got the data on attributes and design features and manufacturing processes that come together and creates that product and we have defect and failure rate data so all we got to do is connect the two and use that to predict which set of features and attributes and processes in combination cause these defects. That was probably a non-trivial endeavor in the past but now with the ability to store and process vast amounts of data (no secret there), no big deal.