Collaborating Authors

Decision Tree Learning

Introduction to Boosted Trees


Welcome to my new article series: Boosting algorithms in machine learning! This is Part 1 of the series. Here, I'll give you a short introduction to boosting, its objective, some key definitions and a list of boosting algorithms that we intend to cover in the next posts. You should be familiar with elementary tree-based machine learning models such as decision trees and random forests. In addition to that, it is recommended to have good knowledge of Python and its Scikit-learn library.

Decision Tree -- Explained


In this blog we are going to talk about decision tree algorithm. Yeah, you read it right. It is a tree, or it looks like a tree (upside down tree) which helps to take decision. How come a tree helps us to take decision? So how do we take any decision?

Feature selection with Random Forest


Random Forest is a supervised model that implements both decision trees and the bagging method. The idea is that the training dataset is resampled according to a procedure called "bootstrap". Each sample contains a random subset of the original columns and is used to fit a decision tree. The number of models and the number of columns are hyperparameters to be optimized. Finally, the predictions of the trees are mixed together calculating the mean value (for regression) or using soft voting (for classification).

Gini Impurity vs Information Gain vs Chi-Square - Methods for Decision Tree Split


Decision trees are one of the most used machine learning models because of their ease of implementation and simple interpretations. To better learn from the data they are applied to, the nodes of the decision trees need to be split based on the attributes of the data. In this article, we will understand the need of splitting a decision tree along with the methods used to split the tree nodes. Gini impurity, information gain and chi-square are the three most used methods for splitting the decision trees. Here we will discuss these three methods and will try to find out their importance in specific cases.

Understanding Random Forest's hyperparameters with images


Decision Tree is a disseminated algorithm to solve problems. It tries to simulate the human thinking process by binarizing each step of the decision. So, at each step, the algorithm chooses between True or False to move forward. That algorithm is simple, yet very powerful, thus widely applied in machine learning models. However, one of the problems with Decision Trees is its difficulty in generalizing a problem. The algorithm learns so well how to decide about a given dataset that when we want to use it to new data, it fails giving us the best answer.

Decision Tree Algorithm


Decision Tree is a Supervised literacy manner that can be used for both group and Reversion cases, but mostly it's preferred for solving Set problems. It's a tree-structured classifier, where interior bumps represent the features of a dataset, branches character the decision rules and each slice bump represents the outcome. In a Decision tree, there are two nodes, which are the Decision Nodule and Leaf Node. Decision nodules are used to make any decision and have multiple branches, whereas Leaf nodules are the output of those judgments and don't contain any fresh branches. The diagnoses or the test are performed on the keystone of features of the given dataset.

What's in a "Random Forest"? Predicting Diabetes


If you've heard of "random forests" as a hot, sexy machine learning algorithm and you want to implement it, great! But if you're not sure exactly what happens in a random forest, or how random forests make their classification decisions, then read on:) We'll find that we can break down random forests into smaller, more digestible pieces. As a forest is made of trees, so a random forest is made of a bunch of randomly sampled sub-components called decision trees. So first let's try to understand what a decision tree is, and how it comes to its prediction. For now, we'll just look at classification decision trees.

How to Mitigate Overfitting by Creating Ensembles


If we summarize what we've done so far in the "Addressing the problem of overfitting" article series, we've discussed three different techniques that can be used to mitigate overfitting. As you already know, Cross-validation (discussed in Part 1), Regularization (discussed in Part 2) and Dimensionality Reduction (discussed in Part 3) can effectively mitigate overfitting. In Part 4, today we discuss another useful technique called Creating Ensembles. However, this technique is limited to tree-based models. Someone can attempt to build a decision tree model (Step 1) without limiting the tree growth (without early stopping or without doing any hyperparameter tuning).

Reaching MLE (machine learning enlightenment) · Vicki Boykis


Once, on a crisp cloudless morning in early fall, a machine learning engineer left her home to seek the answers that she could not find, even in the newly-optimized Google results. She closed her laptop, put on her backpack and hiking boots, and walked quietly out her door and past her mailbox, down a dusty path that led past a stream, until the houses around her gave way to broad fields full of ripening corn. She walked past farms where cows grazed peacefully underneath enormous data silos, until the rows of crops gave way to a smattering of graceful pines and oaks, and she found herself in a forest clearing, headed into the woods. She went deeper through the decision trees and finally stopped near a data stream around midday to have lunch and stretch her legs. The sun made its way through the sky and eventually, she walked further, out of the forest.

Minimax Rates for STIT and Poisson Hyperplane Random Forests Machine Learning

In [12], Mourtada, Ga\"{i}ffas and Scornet showed that, under proper tuning of the complexity parameters, random trees and forests built from the Mondrian process in $\mathbb{R}^d$ achieve the minimax rate for $\beta$-H\"{o}lder continuous functions, and random forests achieve the minimax rate for $(1+\beta)$-H\"{o}lder functions in arbitrary dimension. In this work, we show that a much larger class of random forests built from random partitions of $\mathbb{R}^d$ also achieve these minimax rates. This class includes STIT random forests, the most general class of random forests built from a self-similar and stationary partition of $\mathbb{R}^d$ by hyperplane cuts possible, as well as forests derived from Poisson hyperplane tessellations. Our proof technique relies on classical results as well as recent advances on stationary random tessellations in stochastic geometry.