Collaborating Authors

Decision Tree Learning

Introduction to Random Forest Algorithm


Random Forest is a supervised machine learning algorithm that is composed of individual decision trees. This type of model is called an ensemble model because an "ensemble" of independent models is used to compute a result. The basis for the Random Forest is formed by many individual decision trees, the so-called Decision Trees. A tree consists of different decision levels and branches, which are used to classify data. The Decision Tree algorithm tries to divide the training data into different classes so that the objects within a class are as similar as possible and the objects of different classes are as different as possible. This tree helps to decide whether to do sports outside or not, depending on the weather variables "weather", "humidity" and "wind force".

Discriminatory AI explained with an example


AI is increasingly used in making decisions that impact us directly such as job applications, our credit rating, match-making on dating sites. So it is important that AI is non-discriminatory and that decisions do not favor certain races, gender, the color of skin. Discriminatory AI is a very wide subject going beyond purely technical aspects. However, to make it easily understandable, I will demonstrate how discriminatory AI looks using examples and visuals. This will give you a way to spot a discriminatory AI. Let me first establish the context of the example.

Understanding your Neural Network's predictions


Neural networks are extremely convenient. They are usable for both regression and classification, work on structured and unstructured data, handle temporal data very well, and can usually reach high performances if they are given a sufficient amount of data. What is gained in convenience is, however, lost in interpretability and that can be a major setback when models are presented to a non-technical audience, such as clients or stakeholders. For instance, last year, the Data Science team I am part of wanted to convince a client to go from a decision tree model to a neural network, and for good reasons: we had access to a large amount of data and most of it was temporal. The client was on board, but wanted to keep an understanding of what the model based its decisions on, which means evaluating its features' importance.

The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review

Journal of Artificial Intelligence Research

Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning.

Decision Trees vs Random Forest


Last week I published two articles about Decision Trees: one about Decision and Classification Tree (CART) and another tutorial on how to implement Random Forest classifier. These two methods may look very similar, however there are important differences that every data professional or enthusiastic should know.

All About Decision Tree


Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The decision tree is one of the most powerful and important algorithms present in supervised machine learning.

How to speed up machine learning operations with Jax?


The machine learning algorithms require a lot of mathematical operations and as the performance of the model improves, its mathematical operations also increase with complexity. A simple example of this can be the random forest and decision tree where the random forest is more accurate in maximum cases but has complex mathematics and takes more time than the decision trees. Robust modelling requires a process where large mathematical or numerical operations can be completed robustly. Jax is a library that can help us in improving the speed of mathematical operations. In this article, we will discuss the Jax library in detail.

A guide to feature engineering in time series with Tsfresh


Feature engineering plays a crucial role in many of the data modelling tasks. This is simply a process that defines important features of the data using which a model can enhance its performance. In time series modelling, feature engineering works in a different way because it is sequential data and it gets formed using the changes in any values according to the time. In this article, we are going to discuss feature engineering in time series and also we will cover an implementation of feature engineering in time series using a package called tsfresh. The major points to be discussed in the article are listed below.

Random Forest Regression


A few weeks ago, I wrote an article demonstrating random forest classification models. In this article, we will demonstrate the regression case of random forest using sklearn's RandomForrestRegressor() model. Similarly to my last article, I will begin this article by highlighting some definitions and terms relating to and comprising the backbone of the random forest machine learning. The goal of this article is to describe the random forest model, and demonstrate how it can be applied using the sklearn package. Our goal will not be to solve for the most optimal solution as this is just a basic guide.

Top resources to learn decision trees in 2022


Decision trees are a supervised learning method used to build a model that predicts the value of a target variable by learning simple decision rules from the data features. DTs are used for both classification and regression and are simple to understand and interpret. Below, we have listed down the top online courses, YouTube videos and guides for enthusiasts to master decision trees. The course by CodeAcademy focuses on teaching developers how to build and use decision trees and random forests. The course looks at two methods in detail: Gini impurity and Information Gain.