Decision Tree Learning
Random Forests
Random forests can also be used to identify likely fraudulent transactions. For example, each transaction in a bank has a series of features such as the deviation from the mean transaction volume of the customer, the time of day, the location, and how these values differ from that customer's usual habits. This allows a bank to build a sophisticated model to predict the likelihood of a given transaction being fraudulent. If the probability of fraud exceeds a threshold, such as 50%, the bank can take action, such as freezing the card.
Altruist: Argumentative Explanations through Local Interpretations of Predictive Models
Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios
Interpretable machine learning is an emerging field providing solutions on acquiring insights into machine learning models' rationale. It has been put in the map of machine learning by suggesting ways to tackle key ethical and societal issues. However, existing techniques of interpretable machine learning are far from being comprehensible and explainable to the end user. Another key issue in this field is the lack of evaluation and selection criteria, making it difficult for the end user to choose the most appropriate interpretation technique for its use. In this study, we introduce a meta-explanation methodology that will provide truthful interpretations, in terms of feature importance, to the end user through argumentation. At the same time, this methodology can be used as an evaluation or selection tool for multiple interpretation techniques based on feature importance.
Interpretable Machine Learning with an Ensemble of Gradient Boosting Machines
Konstantinov, Andrei V., Utkin, Lev V.
A method for the local and global interpretation of a black-box model on the basis of the well-known generalized additive models is proposed. It can be viewed as an extension or a modification of the algorithm using the neural additive model. The method is based on using an ensemble of gradient boosting machines (GBMs) such that each GBM is learned on a single feature and produces a shape function of the feature. The ensemble is composed as a weighted sum of separate GBMs resulting a weighted sum of shape functions which form the generalized additive model. GBMs are built in parallel using randomized decision trees of depth 1, which provide a very simple architecture. Weights of GBMs as well as features are computed in each iteration of boosting by using the Lasso method and then updated by means of a specific smoothing procedure. In contrast to the neural additive model, the method provides weights of features in the explicit form, and it is simply trained. A lot of numerical experiments with an algorithm implementing the proposed method on synthetic and real datasets demonstrate its efficiency and properties for local and global interpretation.
Online Decision Trees with Fairness
While artificial intelligence (AI)-based decision-making systems are increasingly popular, significant concerns on the potential discrimination during the AI decision-making process have been observed. For example, the distribution of predictions is usually biased and dependents on the sensitive attributes (e.g., gender and ethnicity). Numerous approaches have therefore been proposed to develop decision-making systems that are discrimination-conscious by-design, which are typically batch-based and require the simultaneous availability of all the training data for model learning. However, in the real-world, the data streams usually come on the fly which requires the model to process each input data once "on arrival" and without the need for storage and reprocessing. In addition, the data streams might also evolve over time, which further requires the model to be able to simultaneously adapt to non-stationary data distributions and time-evolving bias patterns, with an effective and robust trade-off between accuracy and fairness. In this paper, we propose a novel framework of online decision tree with fairness in the data stream with possible distribution drifting. Specifically, first, we propose two novel fairness splitting criteria that encode the data as well as possible, while simultaneously removing dependence on the sensitive attributes, and further adapts to non-stationary distribution with fine-grained control when needed. Second, we propose two fairness decision tree online growth algorithms that fulfills different online fair decision-making requirements. Our experiments show that our algorithms are able to deal with discrimination in massive and non-stationary streaming environments, with a better trade-off between fairness and predictive performance.
Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter
Sharma, Pulkit, Mirzan, Shezan Rohinton, Bhandari, Apurva, Pimpley, Anish, Eswaran, Abhiram, Srinivasan, Soundar, Shao, Liqun
Understanding predictions made by Machine Learning models is critical in many applications. In this work, we investigate the performance of two methods for explaining tree-based models: 'Tree Interpreter (TI)' and'SHapley Additive exPlanations TreeExplainer (SHAP-TE)'. Using a case study on detecting anomalies in job runtimes of applications that utilize cloud-computing platforms, we compare these approaches using a variety of metrics, including computation time, significance of attribution value, and explanation accuracy. We find that, although the SHAP-TE offers consistency guarantees over TI, at the cost of increased computation, consistency does not necessarily improve the explanation performance in our case study.
Succinct Explanations With Cascading Decision Trees
Zhang, Jialu, Santolucito, Mark, Piskac, Ruzica
Classic decision tree learning is a binary classification algorithm that constructs models with first-class transparency - every classification has a directly derivable explanation. However, learning decision trees on modern datasets generates large trees, which in turn generate decision paths of excessive depth, obscuring the explanation of classifications. To improve the comprehensibility of classifications, we propose a new decision tree model that we call Cascading Decision Trees. Cascading Decision Trees shorten the size of explanations of classifications, without sacrificing model performance overall. Our key insight is to separate the notion of a decision path and an explanation path. Utilizing this insight, instead of having one monolithic decision tree, we build several smaller decision subtrees and cascade them in sequence. Our cascading decision subtrees are designed to specifically target explanations for positive classifications. This way each subtree identifies the smallest set of features that can classify as many positive samples as possible, without misclassifying any negative samples. Applying cascading decision trees to new samples results in a significantly shorter and succinct explanation, if one of the subtrees detects a positive classification. In that case, we immediately stop and report the decision path of only the current subtree to the user as an explanation for the classification. We evaluate our algorithm on standard datasets, as well as new real-world applications and find that our model shortens the explanation depth by over 40.8% for positive classifications compared to the classic decision tree model.
Random Forests Classifiers in Python
If you are not yet familiar with Tree-Based Models in Machine Learning, you should take a look at our R course on the subject. Let's understand the algorithm in layman's terms. Suppose you want to go on a trip and you would like to travel to a place which you will enjoy. So what do you do to find a place that you will like? You can search online, read reviews on travel blogs and portals, or you can also ask your friends.
Learning Binary Trees via Sparse Relaxation
Zantedeschi, Valentina, Kusner, Matt J., Niculae, Vlad
One of the most classical problems in machine learning is how to learn binary trees that split data into useful partitions. From classification/regression via decision trees to hierarchical clustering, binary trees are useful because they (a) are often easy to visualize; (b) make computationally-efficient predictions; and (c) allow for flexible partitioning. Because of this there has been extensive research on how to learn such trees that generally fall into one of three categories: 1. greedy node-by-node optimization; 2. probabilistic relaxations for differentiability; 3. mixed-integer programs (MIP). Each of these have downsides: greedy can myopically choose poor splits, probabilistic relaxations do not have principled ways to prune trees, MIP methods can be slow on large problems and may not generalize. In this work we derive a novel sparse relaxation for binary tree learning. By deriving a new MIP and sparsely relaxing it, our approach is able to learn tree splits and tree pruning using argmin differentiation. We demonstrate how our approach is easily visualizable and is competitive with current tree-based approaches in classification/regression and hierarchical clustering. Source code is available at http://github.com/vzantedeschi/LatentTrees .
The IBM Data Scientist Interview
IBM is a multinational technology company founded in 1911 and operates in over 170 countries worldwide. Today, IBM offers a wide spectrum of products and services that includes software solutions, hardware architecture (server and storage architecture), business and technology services, and global financing solutions. As a data driven-company, IBM understands the importance of data and data analytics at every layer of organization to drive better business decisions. Also, a leading provider of Analytics and Cloud-based solutions, IBM offers a full stack of cloud-based products and services spanning across data analytics, storage, AI, IoT, and blockchain. Check out this article about the Microsoft Data Scientist interview!
Decision Tree Classification - A Practice problem
Parent and Child Node - The node which get divided into several sub-node is parent node and the sub-node formed is called child node. Parent and Child Node - The node which get divided into several sub-node is parent node and the sub-node formed is called child node. Subtree /Branch - If a subnode again split into further subnodes that entire part is called subtree (one Parent - Child part).It is a part of entire tree. Subtree /Branch - If a subnode again split into further subnodes that entire part is called subtree (one Parent - Child part).It is a part of entire tree. Decision Node - If a subnode split into further subnodes Then that splitted subnode is called decision node.