AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Sparse Learning with CART

Neural Information Processing SystemsOct-10-2024, 17:06:14 GMT

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodness-of-fit tradeoff when the depth scales with the logarithm of the sample size.

pearson correlation, regression model, sparse learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Joints in Random Forests

Neural Information Processing SystemsOct-10-2024, 16:15:17 GMT

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation.

detect outlier, generative model, random forest, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Smooth And Consistent Probabilistic Regression Trees

Neural Information Processing SystemsOct-10-2024, 16:14:12 GMT

We propose here a generalization of regression trees, referred to as Probabilistic Regression (PR) trees, that adapt to the smoothness of the prediction function relating input and output variables while preserving the interpretability of the prediction and being robust to noise. In PR trees, an observation is associated to all regions of a tree through a probability distribution that reflects how far the observation is to a region. We show that such trees are consistent, meaning that their error tends to 0 when the sample size tends to infinity, a property that has not been established for similar, previous proposals as Soft trees and Smooth Transition Regression trees. We further explain how PR trees can be used in different ensemble methods, namely Random Forests and Gradient Boosted Trees. Lastly, we assess their performance through extensive experiments that illustrate their benefits in terms of performance, interpretability and robustness to noise.

consistent probabilistic regression tree, noise, pr tree

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree

Neural Information Processing SystemsOct-10-2024, 15:27:26 GMT

The training of optimal decision tree via mixed-integer programming (MIP) has attracted much attention in recent literature. However, for large datasets, state-of-the-art approaches struggle to solve the optimal decision tree training problems to a provable global optimal solution within a reasonable time. In this paper, we reformulate the optimal decision tree training problem as a two-stage optimization problem and propose a tailored reduced-space branch and bound algorithm to train optimal decision tree for the classification tasks with continuous features. The computation of bounds can be decomposed into the solution of many small-scale subproblems and can be naturally parallelized. With these bounding methods, we prove that our algorithm can converge by branching only on variables representing the optimal decision tree structure, which is invariant to the size of datasets. Moreover, we propose a novel sample reduction method that can predetermine the cost of part of samples at each BB node.

dataset, scalable deterministic global optimization algorithm, training optimal decision tree, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Universal guarantees for decision tree induction via a higher-order splitting criterion

Neural Information Processing SystemsOct-10-2024, 11:20:55 GMT

We propose a simple extension of {\sl top-down decision tree learning heuristics} such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions f: \{-1,1\} n \to \{-1,1\} with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitting criterion that takes into account the correlations between f and {\sl small subsets} of its attributes. Gini impurity and information gain), in contrast, are based solely on the correlations between f and its {\sl individual} attributes. Our algorithm satisfies the following guarantee: for all target functions f: \{-1,1\} n \to \{-1,1\}, sizes s\in \N, and error parameters \eps, it constructs a decision tree of size s {\tilde{O}((\log s) 2/\eps 2)} that achieves error \le O(\opt_s) \eps, where \opt_s denotes the error of the optimal size- s decision tree for f .

decision tree induction, higher-order splitting criterion, universal guarantee, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Towards Convergence Rate Analysis of Random Forests for Classification

Neural Information Processing SystemsOct-10-2024, 10:30:37 GMT

Random forests have been one of the successful ensemble algorithms in machine learning. The basic idea is to construct a large number of random trees individually and make prediction based on an average of their predictions. The great successes have attracted much attention on the consistency of random forests, mostly focusing on regression. This work takes one step towards convergence rates of random forests for classification. We present the first finite-sample rate O(n {-1/(8d 2)}) on the convergence of pure random forests for classification, which can be improved to be of O(n {-1/(3.87d

classification, convergence rate analysis, random forest, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Optimal Decision Tree with Noisy Outcomes

Neural Information Processing SystemsOct-10-2024, 10:30:33 GMT

A fundamental task in active learning involves performing a sequence of tests to identify an unknown hypothesis that is drawn from a known distribution. This problem, known as optimal decision tree induction, has been widely studied for decades and the asymptotically best-possible approximation algorithm has been devised for it. We study a generalization where certain test outcomes are noisy, even in the more general case when the noise is persistent, i.e., repeating the test on the scenario gives the same noisy output, disallowing simple repetition as a way to gain confidence. We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests. Previous work in the area assumed at most a constant number of noisy outcomes per test and per scenario and provided approximation ratios that were problem dependent (such as the minimum probability of a hypothesis).

hypothesis, noisy outcome, optimal decision tree, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.64)

Add feedback

Partitioning Structure Learning for Segmented Linear Regression Trees

Neural Information Processing SystemsOct-10-2024, 08:51:20 GMT

This paper proposes a partitioning structure learning method for segmented linear regression trees (SLRT), which assigns linear predictors over the terminal nodes. The recursive partitioning process is driven by an adaptive split selection algorithm that maximizes, at each node, a criterion function based on a conditional Kendall's τ statistic that measures the rank dependence between the regressors and the fit- ted linear residuals. Theoretical analysis shows that the split selection algorithm permits consistent identification and estimation of the unknown segments. A suffi- ciently large tree is induced by applying the split selection algorithm recursively. Then the minimal cost-complexity tree pruning procedure is applied to attain the right-sized tree, that ensures (i) the nested structure of pruned subtrees and (ii) consistent estimation to the number of segments.

partitioning structure learning, predictor, segmented linear regression tree, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.65)

Add feedback

A Debiased MDI Feature Importance Measure for Random Forests

Neural Information Processing SystemsOct-10-2024, 06:44:57 GMT

Tree ensembles such as Random Forests have achieved impressive empirical success across a wide variety of applications. To understand how these models make predictions, people routinely turn to feature importance measures calculated from tree ensembles. It has long been known that Mean Decrease Impurity (MDI), one of the most widely used measures of feature importance, incorrectly assigns high importance to noisy features, leading to systematic bias in feature selection. In this paper, we address the feature selection bias of MDI from both theoretical and methodological perspectives. Based on the original definition of MDI by Breiman et al. \cite{Breiman1984} for a single tree, we derive a tight non-asymptotic bound on the expected bias of MDI importance of noisy features, showing that deep trees have higher (expected) feature selection bias than shallow ones.

debiased mdi feature importance measure, feature selection bia, random forest, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Estimating decision tree learnability with polylogarithmic sample complexity

Neural Information Processing SystemsOct-10-2024, 01:35:50 GMT

We show that top-down decision tree learning heuristics (such as ID3, C4.5, and CART) are amenable to highly efficient {\sl learnability estimation}: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with {\sl polylogarithmically} many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient {\sl minibatch} versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give active local'' versions of these heuristics: given a test point x \star, we show how the label T(x \star) of the decision tree hypothesis T can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn T .

decision tree learnability, learnability estimation, polylogarithmic sample complexity, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback