Goto

Collaborating Authors

 Decision Tree Learning


How to Bin or Convert Numerical Variables to Categorical Variables with Decision Trees

@machinelearnbot

This is a guest repost by Jacob Joseph from CleverTap. Why would you want to convert a numerical variable into categorical one? Depending on the situation, it can lead to a better interpretation of the numerical variable, quick segmentation or just an additional feature for building your predictive model by creating bins for the numerical variable. Binning is a popular feature engineering technique. Suppose your hypothesis is that the age of a customer is correlated with their tendency to interact with a mobile app.


Binary Classification: Flight delay prediction

#artificialintelligence

We approach this problem as a classification problem, predicting two classes -- whether the flight will be delayed, or whether it will be on time. Broadly speaking, in machine learning and statistics, classification is the task of identifying the class or category to which a new observation belongs, on the basis of a training set of data containing observations with known categories. Classification is generally a supervised learning problem. Since this is a binary classification task, there are only two classes. To solve this categorization problem, we will build an experiment using Azure ML Studio.


Machine Learning Algorithm : ensemble (part 7 of 12)

#artificialintelligence

In machine learning and computational learning theory, Logit Boost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost functional of logistic regression, one can derive the LogitBoost algorithm. LogitBoost can be seen as a convex optimization. Bootstrap Aggregation (or Bagging for short), is a simple and very powerful ensemble method.


R Decision Tree

#artificialintelligence

Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R. Examples of use of decision tress is predicting an email as spam or not spam, predicting of a tumor is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these. Generally, a model is created with observed data also called training data. Then a set of validation data is used to verify and improve the model.


Analytics and Pricing Decisions

@machinelearnbot

Enterprises make critical pricing decisions very often, these decisions have a major impact on their profitability index and also determines the bottom line of the organization. Often, managers and analysts are forced to determine prices and discounts with limited available information and inefficient toolsets. Pricing Analytics enables the leadership of a firm to proactively manage pricing policies and strategies and guides them to arrive at the right pricing that is in-line with company's market positioning and business strategy. Pricing analytics empowers analysts to draw insight-driven pricing decisions, measure the effectiveness of these decisions, and when required, make adjustments using consistent data within the right business context. Evolving Environment Organizations are seeking to overhaul their pricing capabilities but are facing challenges of fierce competition, highly volatile commodity prices, and ever-demanding retailers in achieving this.


A Classification Engine for Image Ballistics of Social Data

arXiv.org Artificial Intelligence

Image Forensics has already achieved great results for the source camera identification task on images. Standard approaches for data coming from Social Network Platforms cannot be applied due to different processes involved (e.g., scaling, compression, etc.). Over 1 billion images are shared each day on the Internet and obtaining information about their history from the moment they were acquired could be exploited for investigation purposes. In this paper, a classification engine for the reconstruction of the history of an image, is presented. Specifically, exploiting K-NN and decision trees classifiers and a-priori knowledge acquired through image analysis, we propose an automatic approach that can understand which Social Network Platform has processed an image and the software application used to perform the image upload. The engine makes use of proper alterations introduced by each platform as features. Results, in terms of global accuracy on a dataset of 2720 images, confirm the effectiveness of the proposed strategy.


Hidden Decision Trees vs. Decision Trees or Logistic Regression

@machinelearnbot

Hidden Decision Trees is a statistical and data mining methodology (just like logistic regression, SVM, neural networks or decision trees) to handle problems with large amounts of data, non-linearities and strongly correlated dependent variables. The technique is easy to implement in any programming language. It is more robust than decision trees or logistic regression, and help detect natural final nodes. Implementations typically rely heavily on large, granular hash tables. No decision tree is actually built (thus the name hidden decision trees), but the final output of an hidden decision tree procedure consists of a few hundred nodes from multiple non-overlapping small decision trees.


Optimizing your prediction model on Azure – pruning the trees - MD2C

#artificialintelligence

This is a simple example about optimizing your prediction model on Azure. In this case we will use a Boosted Decision Tree model. We will show you how you can use the Permutation Feature Performance module to prune your trees. We start with the Student Performance Classifier from a previous blog. We already found out that the Boosted Decision Tree algorithm gave the best results, so we will start with that one to train our model with.


Introduction to Classification & Regression Trees (CART)

@machinelearnbot

Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a target (or dependent variable) based on the values of several input (or independent variables). In today's post, we discuss the CART decision tree methodology. The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees: Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into. Regression Trees: where the target variable is continuous and tree is used to predict it's value. The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be. The result of these questions is a tree like structure where the ends are terminal nodes at which point there are no more questions.


Random Forests Algorithm

@machinelearnbot

One of the most popular methods or frameworks used by data scientists at the Rose Data Science Professional Practice Group is Random Forests. The Random Forests algorithm is one of the best among classification algorithms - able to classify large amounts of data with accuracy. Random Forests are an ensemble learning method (also thought of as a form of nearest neighbor predictor) for classification and regression that construct a number of decision trees at training time and outputting the class that is the mode of the classes output by individual trees (Random Forests is a trademark of Leo Breiman and Adele Cutler for an ensemble of decision trees). Random Forests are a combination of tree predictors where each tree depends on the values of a random vector sampled independently with the same distribution for all trees in the forest. The basic principle is that a group of "weak learners" can come together to form a "strong learner".