Goto

Collaborating Authors

Decision Tree Learning


From Decision Trees and Random Forests to Gradient Boosting

#artificialintelligence

Suppose we wish to perform supervised learning on a classification problem to determine if an incoming email is spam or not spam. The spam dataset consists of 4601 emails, each labelled as real (or not spam) (0) or spam (1). The data also contains a large number of predictors (57), each of which is either a character count, or a frequency of occurrence of a certain word or symbol. In this short article, we will briefly cover the main concepts in tree based classification and compare and contrast the most popular methods. This dataset and several worked examples are covered in detail in The Elements of Statistical Learning, II edition.


Interpretability, Explainability, and Machine Learning

#artificialintelligence

Susan will present, "Understanding and Addressing Bias in Analytics" at CONVERGE, December 1-2. This article was originally published on KDnuggets. I use one of those credit monitoring services that regularly emails me about my credit score: "Congratulations, your score has gone up!" "Uh oh, your score has gone down! I shrug and delete the emails. Credit scores are just one example of the many automated decisions made about us as individuals on the basis of complex models.


How to Future-Proof Your Data Science Project - KDnuggets

#artificialintelligence

Nontechnical stakeholders struggle to define business requirements. Crossfunctional teams face an uphill battle to set up robust pipelines for replicable data delivery. Machine learning models can take on a life of their own. If you've been ignoring these critical elements in the past, you may find your deployment rate skyrockets. Your data products may depend on correctly deploying the tips from this article.


3 decision tree-based algorithms for Machine Learning

#artificialintelligence

Decision trees are a tree algorithm that split the data based on certain decisions. Look at the image below of a very simple decision tree. We want to decide if an animal is a cat or a dog based on 2 questions. We can answer each question and depending on the answer, we can classify the animal as either a dog or a cat. The red lines represent the answer "NO" and the green line, "YES".


Decision Trees, Random Forests, AdaBoost & XGBoost in Python

#artificialintelligence

You're looking for a complete Decision tree course that teaches you everything you need to create a Decision tree/ Random Forest/ XGBoost model in Python, right? You've found the right Decision Trees and tree based advanced techniques course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning advanced course. If you are a business manager or an executive, or a student who wants to learn and apply machine learning in Real world problems of business, this course will give you a solid base for that by teaching you some of the advanced technique of machine learning, which are Decision tree, Random Forest, Bagging, AdaBoost and XGBoost.


Regression Trees for Cumulative Incidence Functions

#artificialintelligence

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun only recently. In this paper, we develop a novel approach to building regression trees for estimating cumulative incidence curves in a competing risks setting. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees.


Interpretability, Explainability, and Machine Learning – What Data Scientists Need to Know - KDnuggets

#artificialintelligence

I use one of those credit monitoring services that regularly emails me about my credit score: "Congratulations, your score has gone up!" "Uh oh, your score has gone down!" I shrug and delete the emails. Credit scores are just one example of the many automated decisions made about us as individuals on the basis of complex models. I don't know exactly what causes those little changes in my score. Some machine learning models are "black boxes," a term often used to describe models whose inner workings -- the ways different variables ended up related to one another by an algorithm -- may be impossible for even their designers to completely interpret and explain.


AI Clarified: Is AI More Biased Than Humans or Less?

#artificialintelligence

Exploring bias in AI systems, and what we can do to prevent it. For business and non-profit leaders trying to understand AI, it can be surprisingly difficult to find good information in the sweet spot between high-level overview and technical jargon. The AI Clarified series attempts to fill this void and answer some of the most commonly asked AI questions with practical, easy-to-follow explanations. Question: Is AI more biased than humans, or less? I've heard both and am not sure which side to believe. Indeed it's hard to know what to believe about bias in Artificial Intelligence (AI) systems when just reading articles online -- there is plenty of support in both directions. With the growth of AI and the widespread adaption of AI models, there is a lot of noise on both sides, especially for high-stakes use cases like those affecting humans. Let's take hiring as an example.


A Non Mathematical guide to the mathematics behind Machine Learning

#artificialintelligence

This model finds the "best fit" line through a set of data points by using a simple formula. The variable you want to predict (the dependent variable) is represented as an equation of variables you know (independent variables). The prediction can be obtained through the outcome of the equation by inputting the independent variables, and having the equation provide the answer. The main categories of Linear models used are Linear Regression and Logistic Regression. Linear Regression is used for predicting numerical values using the "best fit" line through all data points.


How to Develop a Random Subspace Ensemble With Python

#artificialintelligence

Random Subspace Ensemble is a machine learning algorithm that combines the predictions from multiple decision trees trained on different subsets of columns in the training dataset. Randomly varying the columns used to train each contributing member of the ensemble has the effect of introducing diversity into the ensemble and, in turn, can lift performance over using a single decision tree. It is related to other ensembles of decision trees such as bootstrap aggregation (bagging) that creates trees using different samples of rows from the training dataset, and random forest that combines ideas from bagging and the random subspace ensemble. Although decision trees are often used, the general random subspace method can be used with any machine learning model whose performance varies meaningfully with the choice of input features. In this tutorial, you will discover how to develop random subspace ensembles for classification and regression.