decision tree learning


From local explanations to global understanding with explainable AI for trees

#artificialintelligence

Tree-based machine learning models such as random forests, decision trees and gradient boosted trees are popular nonlinear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here we improve the interpretability of tree-based models through three main contributions. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to (1) identify high-magnitude but low-frequency nonlinear mortality risk factors in the US population, (2) highlight distinct population subgroups with shared risk characteristics, (3) identify nonlinear interaction effects among risk factors for chronic kidney disease and (4) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.


Decision Tree Algorithm, Explained - KDnuggets

#artificialintelligence

Classification is a two-step process, learning step and prediction step, in machine learning. In the learning step, the model is developed based on given training data. In the prediction step, the model is used to predict the response for given data. Decision Tree is one of the easiest and popular classification algorithms to understand and interpret. Decision Tree algorithm belongs to the family of supervised learning algorithms.


Understanding Decision Tree Classification with Scikit-Learn

#artificialintelligence

Gini Impurity is named after the Italian statistician Corrado Gini. Gini impurity can be understood as a criterion to minimize the probability of misclassification. To understand the definition (as shown in the figure) and exactly how we can build up a decision tree, let's get started with a very simple data-set, where depending on various weather conditions, we decide whether to play an outdoor game or not. From the definition, a data-set containing only one class will have 0 Gini Impurity. In building up the decision tree our idea is to choose the feature with least Gini Impurity as root node and so on... Let's get started with the simple data-set -- Here we see that depending on 4 features (Outlook, Temperature, Humidity, Wind), decision is made on whether to play tennis or not.


An Introduction to Random Forest with Python and scikit-learn

#artificialintelligence

NOTE: This post assumes basic understanding of decision trees. If you need to refresh how Decision Trees work, I recommend you to first read An Introduction to Decision Trees with Python and scikit-learn. The good thing about Random Forest is that if we understand Decision Trees very well, it should be very easy to understand Random Forest as well. The name Random Forest actually describes pretty well the extra features added. Firstly, we now have something that is random, which I'll explain more in depth.


Variance Penalizing AdaBoost

Neural Information Processing Systems

This paper proposes a novel boosting algorithm called VadaBoost which is motivated by recent empirical Bernstein bounds. VadaBoost iteratively minimizes a cost function that balances the sample mean and the sample variance of the exponential loss. Each step of the proposed algorithm minimizes the cost efficiently by providing weighted data to a weak learner rather than requiring a brute force evaluation of all possible weak learners. Thus, the proposed algorithm solves a key limitation of previous empirical Bernstein boosting methods which required brute force enumeration of all possible weak learners. Experimental results confirm that the new algorithm achieves the performance improvements of EBBoost yet goes beyond decision stumps to handle any weak learner.


(RF) 2 -- Random Forest Random Field

Neural Information Processing Systems

We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF) 2. Inference of (RF) 2 uses the Swendsen-Wang cut algorithm, characterized by Metropolis-Hastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples.


Graph-Valued Regression

Neural Information Processing Systems

In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X x$ as graph-valued regression''. In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method Graph-optimized CART'', or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency.


Learning Partially Observable Models Using Temporally Abstract Decision Trees

Neural Information Processing Systems

This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). Papers published at the Neural Information Processing Systems Conference.


Provably robust boosted decision stumps and trees against adversarial attacks

Neural Information Processing Systems

The problem of adversarial robustness has been studied extensively for neural networks. However, for boosted decision trees and decision stumps there are almost no results, even though they are widely used in practice (e.g. We show in this paper that for boosted decision stumps the \textit{exact} min-max robust loss and test error for an $l_\infty$-attack can be computed in $O(T\log T)$ time per input, where $T$ is the number of decision stumps and the optimal update step of the ensemble can be done in $O(n 2\,T\log T)$, where $n$ is the number of data points. Moreover, the robust test error rates we achieve are competitive to the ones of provably robust convolutional networks. Papers published at the Neural Information Processing Systems Conference.


Robustness Verification of Tree-based Models

Neural Information Processing Systems

We study the robustness verification problem of tree based models, including random forest (RF) and gradient boosted decision tree (GBDT). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches cast this verification problem into a mixed integer linear programming (MILP) problem, which finds the minimal adversarial distortion in exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles the verification problem can be cast as a max-clique problem on a multi-partite boxicity graph.