AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Generalization Properties of Decision Trees on Real-valued and Categorical Features

Leboeuf, Jean-Samuel, LeBlanc, Frédéric, Marchand, Mario

arXiv.org Artificial IntelligenceOct-18-2022

We revisit binary decision trees from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each. For each feature type, we upper bound the partitioning function of the class of decision stumps before extending the bounds to the class of general decision tree (of any fixed structure) using a recursive approach. Using these new results, we are able to find the exact VC dimension of decision stumps on examples of $\ell$ real-valued features, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\lfloor\frac{d}{2}\rfloor}$. Furthermore, we show that the VC dimension of a binary tree structure with $L_T$ leaves on examples of $\ell$ real-valued features is in $O(L_T \log(L_T\ell))$. Finally, we elaborate a pruning algorithm based on these results that performs better than the cost-complexity and reduced-error pruning algorithms on a number of data sets, with the advantage that no cross-validation is required.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.10781

Country:

North America > Canada > Quebec (0.04)
North America > United States > Wisconsin (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Mixing Time Lower Bound for a Simplified Version of BART

Ronen, Omer, Saarinen, Theo, Tan, Yan Shuo, Duncan, James, Yu, Bin

arXiv.org Artificial IntelligenceOct-17-2022

Decision tree models such as CART (Breiman et al., 1984) and their ensembles such as Random Forests (Breiman, 2001) and Gradient Boosted Trees (Chen & Guestrin, 2016; Friedman, 2001) have proved to be enormously successful supervised learning algorithms, because they are able to combine non-parametric model fitting with implicit dimension reduction. It is often difficult to quantify the uncertainty of their predictions and due to their greedy local splitting criteria, there is no guarantee for the optimality of the constructed decision trees. An alternative approach is to construct the decision trees in a Bayesian manner (H. A. Chipman et al., 1998; Denison et al., 1998; Wu et al., 2007) To address these issues, H. A. Chipman et al., 1998 proposed a Bayesian adaptation of CART, Bayesian CART, and later, a sum of Bayesian CART trees, which they called Bayesian Additive Regression Trees (BART) (H. A. Chipman et al., 2010). One perspective views these algorithms as non-greedy stochastic versions of their deterministic equivalents, where the randomness inside the fitting process allows the algorithm to explore the space of possible decision trees in ways the CART algorithm cannot. An alternative perspective views these algorithms as Bayesian non-parametric regression models, in which we put a prior on the space of decision trees, assume a likelihood for the observed data, and then obtain a posterior distribution over the possible decision trees based on the training data. The posterior distribution can be used to provide posterior predictive credible intervals and other forms of uncertainty quantification.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2210.09352

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

Wilton, Jonathan, Koay, Abigail M. Y., Ko, Ryan K. L., Xu, Miao, Ye, Nan

arXiv.org Artificial IntelligenceOct-16-2022

The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random forest algorithms for PU-learning. Key to our approach is a new interpretation of decision tree algorithms for positive and negative data as \emph{recursive greedy risk minimization algorithms}. We extend this perspective to the PU setting to develop new decision tree learning algorithms that directly minimizes PU-data based estimators for the expected risk. This allows us to develop an efficient PU random forest algorithm, PU extra trees. Our approach features three desirable properties: it is robust to the choice of the loss function in the sense that various loss functions lead to the same decision trees; it requires little hyperparameter tuning as compared to neural network based PU learning; it supports a feature importance that directly measures a feature's contribution to risk minimization. Our algorithms demonstrate strong performance on several datasets. Our code is available at \url{https://github.com/puetpaper/PUExtraTrees}.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.08461

Country:

Oceania > Australia > Queensland (0.04)
Asia > Japan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

The Lepto-Variance of Stock Returns

Polimenis, Vassilis

arXiv.org Artificial IntelligenceOct-16-2022

The Regression Tree (RT) sorts the samples using a specific feature and finds the split point that produces the maximum variance reduction from a node to its children. Our key observation is that the best factor to use (in terms of MSE drop) is always the target itself, as this most clearly separates the target. Thus using the target as the splitting factor provides an upper bound on MSE drop (or lower bound on the residual children MSE). Based on this observation, we define the k-bit lepto-variance ${\lambda}k^2$ of a target variable (or equivalently the lepto-variance at a specific depth k) as the variance that cannot be removed by any regression tree of a depth equal to k. As the upper bound performance for any feature, we believe ${\lambda}k^2$ to be an interesting statistical concept related to the underlying structure of the sample as it quantifies the resolving power of the RT for the sample. The max variance that may be explained using RTs of depth up to k is called the sample k-bit macro-variance. At any depth, total sample variance is thus decomposed into lepto-variance ${\lambda}^2$ and macro-variance ${\mu}^2$. We demonstrate the concept, by performing 1- and 2-bit RT based lepto-structure analysis for daily IBM stock returns.

artificial intelligence, machine learning, return vector, (15 more...)

arXiv.org Artificial Intelligence

2207.04867

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Creating an Ensemble Voting Classifier with Scikit-Learn

#artificialintelligenceOct-15-2022, 02:55:25 GMT

Classification ensemble models are those composed by many models fitted to the same data, where the result for the classification can be the majority's vote, an average of the results, or the best performing model result. In Figure 1, there is an example of the voting classifier that we are going to build in this quick tutorial. Observe that there are three models fitted to the data. Two of them classified the data as 1, while one classified as 0. So, by the majority's vote, class 1 wins, and that is the result. In Scikit-Learn, a commonly used example of ensemble model is the Random Forest classifier.

classifier, scikit-learn, voting classifier, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.42)

Add feedback

How to Build a Machine Learning Model in Rust

#artificialintelligenceOct-14-2022, 11:02:49 GMT

Machine learning is a really interesting concept in computer programming. It involves using data to train a computer program to carry out tasks. During the process, the program learns from data by discovering patterns. This reduces the need for programmers to hard code rules in some applications. Languages like Python and R are excellent for learning and carrying out machine learning tasks.

algorithm, dataset, decision tree, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.37)

Add feedback

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

Koch, Caleb, Strassle, Carmen, Tan, Li-Yang

arXiv.org Artificial IntelligenceOct-12-2022

We establish new hardness results for decision tree optimization problems, adding to a line of work that dates back to Hyafil and Rivest in 1976. We prove, under randomized ETH, superpolynomial lower bounds for two basic problems: given an explicit representation of a function $f$ and a generator for a distribution $\mathcal{D}$, construct a small decision tree approximator for $f$ under $\mathcal{D}$, and decide if there is a small decision tree approximator for $f$ under $\mathcal{D}$. Our results imply new lower bounds for distribution-free PAC learning and testing of decision trees, settings in which the algorithm only has restricted access to $f$ and $\mathcal{D}$. Specifically, we show: $n$-variable size-$s$ decision trees cannot be properly PAC learned in time $n^{\tilde{O}(\log\log s)}$, and depth-$d$ decision trees cannot be tested in time $\exp(d^{\,O(1)})$. For learning, the previous best lower bound only ruled out $\text{poly}(n)$-time algorithms (Alekhnovich, Braverman, Feldman, Klivans, and Pitassi, 2009). For testing, recent work gives similar though incomparable bounds in the setting where $f$ is random and $\mathcal{D}$ is nonexplicit (Blais, Ferreira Pinto Jr., and Harms, 2021). Assuming a plausible conjecture on the hardness of Set-Cover, we show our lower bound for learning decision trees can be improved to $n^{\Omega(\log s)}$, matching the best known upper bound of $n^{O(\log s)}$ due to Ehrenfeucht and Haussler (1989). We obtain our results within a unified framework that leverages recent progress in two lines of work: the inapproximability of Set-Cover and XOR lemmas for query complexity. Our framework is versatile and yields results for related concept classes such as juntas and DNF formulas.

artificial intelligence, decision tree, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2210.06375

Country:

Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

Konstantinov, Andrei V., Utkin, Lev V.

arXiv.org Artificial IntelligenceOct-11-2022

New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of the Huber's contamination models and can be regarded as an analog of the multi-head attention with "heads" defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to make the tuning contamination parameters to be training and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of proposed algorithms can be found in https://github.com/andruekonst/leaf-attention-forest.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.05168

Country:

Asia > Russia (0.28)
North America > United States > New York (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Europe > France > Hauts-de-France > Oise > Compiègne (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Local Interpretable Model Agnostic Shap Explanations for machine learning models

Aditya, P. Sai Ram, Pal, Mayukha

arXiv.org Artificial IntelligenceOct-10-2022

With the advancement of technology for artificial intelligence (AI) based solutions and analytics compute engines, machine learning (ML) models are getting more complex day by day. Most of these models are generally used as a black box without user interpretability. Such complex ML models make it more difficult for people to understand or trust their predictions. There are variety of frameworks using explainable AI (XAI) methods to demonstrate explainability and interpretability of ML models to make their predictions more trustworthy. In this manuscript, we propose a methodology that we define as Local Interpretable Model Agnostic Shap Explanations (LIMASE). This proposed ML explanation technique uses Shapley values under the LIME paradigm to achieve the following (a) explain prediction of any model by using a locally faithful and interpretable decision tree model on which the Tree Explainer is used to calculate the shapley values and give visually interpretable explanations.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2210.04533

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Telangana (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.69)

Add feedback

Risk Automatic Prediction for Social Economy Companies using Camels

Gallego-Mejia, Joseph, Martin-Vega, Daniela, Gonzalez, Fabio

arXiv.org Artificial IntelligenceOct-10-2022

Governments have to supervise and inspect social economy enterprises (SEEs). However, inspecting all SEEs is not possible due to the large number of SEEs and the low number of inspectors in general. We proposed a prediction model based on a machine learning approach. The method was trained with the random forest algorithm with historical data provided by each SEE. Three consecutive periods of data were concatenated. The proposed method uses these periods as input data and predicts the risk of each SEE in the fourth period. The model achieved 76\% overall accuracy. In addition, it obtained good accuracy in predicting the high risk of a SEE. We found that the legal nature and the variation of the past-due portfolio are good predictors of the future risk of a SEE. Thus, the risk of a SEE in a future period can be predicted by a supervised machine learning method. Predicting the high risk of a SEE improves the daily work of each inspector by focusing only on high-risk SEEs.

artificial intelligence, consecutive period, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.05052

Country:

North America > United States (0.28)
Asia > India (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry: Banking & Finance > Trading (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.39)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback