Goto

Collaborating Authors

 Decision Tree Learning


Decision Trees - Introduction

#artificialintelligence

Decision trees are simple and powerful types of multiple variable analysis. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. These segments form an inverted decision tree that originates with a root node at the top of the tree. The object of analysis is reflected in this root node as a simple, one-dimensional display in the decision tree interface. The name of the field of data that is the object of analysis is usually displayed, along with the spread or distribution of the values that are contained in that field.


Decision tree vs. linearly separable or non-separable pattern

@machinelearnbot

As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns. Its decision boundary was drawn almost perfectly parallel to the assumed true boundary, i.e. Awful result, it appears to never follow the true boundary. Just a little improved, but it still appears to be overfitted. Even worse... it appears to get more overfitted than the case of 2-classes.


Comments on: "A Random Forest Guided Tour" by G. Biau and E. Scornet

arXiv.org Machine Learning

This paper is a comment on the survey paper by Biau and Scornet (2016) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest, leading to conclusions that could apply more generally. Then, we consider "holdout" random forests, which are a good middle point between "toy" pure forests and Breiman's original random forests. We would like to thank G. Biau and E. Scornet for their clear and thought-provoking survey (Biau and Scornet, 2016).


Walmart and Random Forest

@machinelearnbot

In the recent Walmart Kaggle competition I used a Random Forest classifier to solve a market basket problem. A market basket model is built on the idea there exists relationships between items purchased together. For example, a person purchasing a new toothbrush is more likely to also purchase toothpaste than motor oil in the same shopping. Retailers use these market basket relationships in the design of their stores for ease of use and also to increase sales. In this specific problem Walmart has broken up their shopping trips into 38 unique'TripType'.


Mastering Machine Learning with R

#artificialintelligence

Machine learning is a field of Artificial Intelligence to build systems that learn from data. Given the growing prominence of R--a cross-platform, zero-cost statistical programming environment--there has never been a better time to start applying machine learning to your data. The book starts with introduction to Cross-Industry Standard Process for Data Mining. It takes you through Multivariate Regression in detail. Moving on, you will also address Classification and Regression trees. You will learn a couple of "Unsupervised techniques."


Pure Python Decision Trees

#artificialintelligence

By now we all know what Random Forests is. We know about the great off-the-self performance, ease of tuning and parallelization, as well as it's importance measures. It's easy for engineers implementing RF to forget about it's underpinnings. Unlike some of it's more modern and advanced contemporaries, descision trees are easy to interpret.


Choosing features for random forests algorithm

@machinelearnbot

There are many ways to choose features with given data, and it is always a challenge to pick up the ones with which a particular algorithm will work better. Here I will consider data from monitoring performance of physical exercises with wearable accelerometers, for example, wrist bands. The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har. In this project, researchers used data from accelerometers on the belt, forearm, arm, and dumbbell of few participants. They were asked to perform barbell lifts correctly, marked as "A", and incorrectly with four typical mistakes, marked as "B", "C", "D" and "E".


3 Must-Ask Questions Before Choosing That Machine Learning Algorithm!

@machinelearnbot

You know that you want to build a predictive model. You've framed your problem in terms of classification or regression. You've prepared some training data (which took an age). You've heard or experienced first hand that Random Forests, Elastic Net Regression or Deep Belief Networks are "the business" and so you're going to use one of these (you've probably already verified that these algorithms are appropriate to your problem based on their general capabilities: whether it be their ability to deal with real valued data, "big" streaming data, multiple classes and so on). However, no two algorithms are the same (if they were we'd simply have fewer to choose from). As such there are a host of questions that you may not have even thought to ask which could make or break your choice.


Best Machine Learning, Data Mining, & NLP Books for Data Scientists and Machine Learning Engineers

#artificialintelligence

Top Machine Learning & Data Mining Books - for this post, we have scraped various signals (e.g. We have combined all signals to compute a Quality Score for each book and publish the list of top Machine Learning and Data Mining books. The readers will love the list because it is data-driven & objective. This book is very well rated on Amazon website and is written by three professors from USC, Stanford and University of Washington. The three authors: Gareth James, Daniela Witten, & Trevor Hastie all have backgrounds in statistics.


owocki/pytrader

#artificialintelligence

I built this as a side project in January / February 2016, as a practical means of getting some experience with machine learning, quantitative finance, and of course hopefully making some profit;). Here's an example of a Decision Tree classifier being used to make a buy (blue), sell (red), or hold(green) decision on the BTC_ETH pair. On both graphs, the x axis is a recent price movement, and the y axis is a previous price movement, the length of which is determined by a parameter called granularity. These graphs show only the last two price movements. The graphing library used is constrained by two dimensional space, but you could generate a classifier that acts upon n pricemovements ( n dimensional space).