Collaborating Authors

decision tree learning

07 -- Hands On ML -- Ensemble


Ensemble Learning is taking the predictions of multiple models and assume the output to be having the most votes. When you train multiple Decision Trees each on some random sampling of the dataset and for predictions you take predictions of all the trees, the output class would be the class which gets the most votes. This approach is called Random Forest. Voting classifier is when you train the data on multiple classifier such as Logistic Regression, SVM, RF and other classifiers and the majority vote is the predicted output class ie hard classifier. Voting can also be taken as soft by taking argmax of the outputs.

Data Science Meets Combinatorics


As a lifelong computational scientist (and now data scientist) I have always been fascinated with numbers, especially lists and tables of things ( databases!). For example, I thought early in life that I wanted to be a Math major in college and study number theory so that I could learn all of the amazing ways to do cool stuff with numbers. But I also wanted to study the wonders of the Universe as an astronomer, so I went on to get a PhD in astrophysics! That career path allowed me to study and apply all of the disciplines that I enjoy: math, physics, astronomy, computational modeling, and data science! It was numbers all the time!

Understanding Random Forests For Machine Learning


It has an important place in machine learning to solve regression and classification problems. It is useful for producing results with a machine learning algorithm without hypermeter tuning. So what does hypermeter tuning mean?

Machine Learning -- Beginners Guide to Random Forest Classifiers (The Code)


So if you haven't already checked it out, I have posted about the mathematics behind this machine learning technique. If this is the first time you're coming across this algorithm I recommend you give it a read before jumping into the code. Otherwise, we're going to jump right into it!

Tokyo Paralympics: Sarah Storey wins record-breaking 17th gold in women's C4-5 road race

BBC News

Sarah Storey wins her 17th Paralympic gold by defending her women's C4-5 road race title to become Great Britain's most successful Paralympian of all time.

Decision Tree Algorithm -A Complete Guide - Analytics Vidhya


Till now we have learned about linear regression, logistic regression, and they were pretty hard to understand. Let's now start with Decision tree's and I assure you this is probably the easiest algorithm in Machine Learning. There's not much mathematics involved here. Since it is very easy to use and interpret it is one of the most widely used and practical methods used in Machine Learning. Root Nodes – It is the node present at the beginning of a decision tree from this node the population starts dividing according to various features.

FARF: A Fair and Adaptive Random Forests Classifier Artificial Intelligence

As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between accuracy and fairness that needs to be accounted for, but current methods often have multiple hyperparameters with non-trivial interaction to achieve fairness. In this paper, we propose a flexible ensemble algorithm for fair decision-making in the more challenging context of evolving online settings. This algorithm, called FARF (Fair and Adaptive Random Forests), is based on using online component classifiers and updating them according to the current distribution, that also accounts for fairness and a single hyperparameters that alters fairness-accuracy balance. Experiments on real-world discriminated data streams demonstrate the utility of FARF.

Gradient Boosted Decision Trees explained with a real-life example and some Python code


Gradient Boosting algorithms tackle one of the biggest problems in Machine Learning: bias. Decision Trees is a simple and flexible algorithm. An underfit Decision Tree has low depth, meaning it splits the dataset only a few of times in an attempt to separate the data. Because it doesn't separate the dataset into more and more distinct observations, it can't capture the true patterns in it. When it comes to tree-based algorithms Random Forests was revolutionary, because it used Bagging to reduce the overall variance of the model with an ensemble of random trees.

InfoGram and Admissible Machine Learning Artificial Intelligence

We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy. We have illustrated our approach using several real-data examples from financial sectors, biomedical research, marketing campaigns, and the criminal justice system.

Trustworthy AI: A Computational Perspective Artificial Intelligence

In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.