AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

Arwade, AGaurav, Olafsson, Sigurdur

arXiv.org Machine LearningOct-13-2021

Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple structures. While increased accuracy is often crucial, less complexity also has value. This paper proposes a training data selection algorithm that identifies multiple subsets with simple structures. A learning algorithm trained on such a subset can classify an instance belonging to the subset with better accuracy than the traditional learning algorithms. In other words, while existing pattern recognition algorithms attempt to learn a global mapping function to represent the entire dataset, we argue that an ensemble of simple local patterns may better describe the data. Hence the sub-setting algorithm identifies multiple subsets with simple local patterns by identifying similar instances in the neighborhood of an instance. This motivation has similarities to that of gradient boosted trees but focuses on the explainability of the model that is missing for boosted trees. The proposed algorithm thus balances accuracy and explainable machine learning by identifying a limited number of subsets with simple structures. We applied the proposed algorithm to the international stroke dataset to predict the probability of survival. Our bottom-up sub-setting algorithm performed on an average 15% better than the top-down decision tree learned on the entire dataset. The different decision trees learned on the identified subsets use some of the previously unused features by the whole dataset decision tree, and each subset represents a distinct population of data.

algorithm, decision tree, subset, (13 more...)

arXiv.org Machine Learning

2110.06527

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Feature selection with Random Forest

#artificialintelligenceOct-11-2021, 16:25:26 GMT

Random Forest is a supervised model that implements both decision trees and the bagging method. The idea is that the training dataset is resampled according to a procedure called "bootstrap". Each sample contains a random subset of the original columns and is used to fit a decision tree. The number of models and the number of columns are hyperparameters to be optimized. Finally, the predictions of the trees are mixed together calculating the mean value (for regression) or using soft voting (for classification).

feature importance, random forest, selection, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Gini Impurity vs Information Gain vs Chi-Square - Methods for Decision Tree Split

#artificialintelligenceOct-9-2021, 22:50:46 GMT

Decision trees are one of the most used machine learning models because of their ease of implementation and simple interpretations. To better learn from the data they are applied to, the nodes of the decision trees need to be split based on the attributes of the data. In this article, we will understand the need of splitting a decision tree along with the methods used to split the tree nodes. Gini impurity, information gain and chi-square are the three most used methods for splitting the decision trees. Here we will discuss these three methods and will try to find out their importance in specific cases.

decision tree, information gain, node, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Understanding Random Forest's hyperparameters with images

#artificialintelligenceOct-7-2021, 18:40:18 GMT

Decision Tree is a disseminated algorithm to solve problems. It tries to simulate the human thinking process by binarizing each step of the decision. So, at each step, the algorithm chooses between True or False to move forward. That algorithm is simple, yet very powerful, thus widely applied in machine learning models. However, one of the problems with Decision Trees is its difficulty in generalizing a problem.

algorithm, hyperparameter, random forest, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)

Add feedback

Decision Tree Algorithm

#artificialintelligenceOct-7-2021, 08:30:11 GMT

Decision Tree is a Supervised literacy manner that can be used for both group and Reversion cases, but mostly it's preferred for solving Set problems. It's a tree-structured classifier, where interior bumps represent the features of a dataset, branches character the decision rules and each slice bump represents the outcome. In a Decision tree, there are two nodes, which are the Decision Nodule and Leaf Node. Decision nodules are used to make any decision and have multiple branches, whereas Leaf nodules are the output of those judgments and don't contain any fresh branches. The diagnoses or the test are performed on the keystone of features of the given dataset.

dataset, decision tree algorithm, nodule, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)

Add feedback

Automated Testing of AI Models

Haldar, Swagatam, Vijaykeerthy, Deepak, Saha, Diptikalyan

arXiv.org Artificial IntelligenceOct-7-2021

The last decade has seen tremendous progress in AI technology and applications. With such widespread adoption, ensuring the reliability of the AI models is crucial. In past, we took the first step of creating a testing framework called AITEST for metamorphic properties such as fairness, robustness properties for tabular, time-series, and text classification models. In this paper, we extend the capability of the AITEST tool to include the testing techniques for Image and Speech-to-text models along with interpretability testing for tabular models. These novel extensions make AITEST a comprehensive framework for testing AI models.

international conference, noise, transformation, (16 more...)

arXiv.org Artificial Intelligence

2110.0332

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)

Add feedback

Foundations of Symbolic Languages for Model Interpretability

Arenas, Marcelo, Baez, Daniel, Barceló, Pablo, Pérez, Jorge, Subercaseaux, Bernardo

arXiv.org Artificial IntelligenceOct-5-2021

Several queries and scores have been proposed to explain individual predictions made by ML models. Examples include queries based on "anchors", which are parts of an instance that are sufficient to justify its classification, and "featureperturbation" scores such as SHAP. Given the need for flexible, reliable, and easy-toapply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic called FOIL, that allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over classes of ML models often deemed to be easily interpretable: decision trees and more general decision diagrams. Since the number of possible inputs for an ML model is exponential in its dimension, tractability of the FOIL evaluation problem is delicate, but can be achieved by either restricting the structure of the models, or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language, and perform experiments showing that such a language can be used in practice.

decision tree, foil, query, (16 more...)

arXiv.org Artificial Intelligence

2110.02376

Country:

South America > Chile (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Treeging

Watson, Gregory L., Jerrett, Michael, Reid, Colleen E., Telesca, Donatello

arXiv.org Machine LearningOct-3-2021

Treeging combines the flexible mean structure of regression trees with the covariance-based prediction strategy of kriging into the base learner of an ensemble prediction algorithm. In so doing, it combines the strengths of the two primary types of spatial and space-time prediction models: (1) models with flexible mean structures (often machine learning algorithms) that assume independently distributed data, and (2) kriging or Gaussian Process (GP) prediction models with rich covariance structures but simple mean structures. We investigate the predictive accuracy of treeging across a thorough and widely varied battery of spatial and space-time simulation scenarios, comparing it to ordinary kriging, random forest and ensembles of ordinary kriging base learners. Treeging performs well across the board, whereas kriging suffers when dependence is weak or in the presence of spurious covariates, and random forest suffers when the covariates are less informative. Treeging also outperforms these competitors in predicting atmospheric pollutants (ozone and PM$_{2.5}$) in several case studies. We examine sensitivity to tuning parameters (number of base learners and training data sampling proportion), finding they follow the familiar intuition of their random forest counterparts. We include a discussion of scaleability, noting that any covariance approximation techniques that expedite kriging (GP) may be similarly applied to expedite treeging.

covariate, prediction, random forest, (13 more...)

arXiv.org Machine Learning

2110.01053

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > French Guiana (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

What's in a "Random Forest"? Predicting Diabetes

#artificialintelligenceOct-2-2021, 05:50:46 GMT

If you've heard of "random forests" as a hot, sexy machine learning algorithm and you want to implement it, great! But if you're not sure exactly what happens in a random forest, or how random forests make their classification decisions, then read on:) We'll find that we can break down random forests into smaller, more digestible pieces. As a forest is made of trees, so a random forest is made of a bunch of randomly sampled sub-components called decision trees. So first let's try to understand what a decision tree is, and how it comes to its prediction. For now, we'll just look at classification decision trees.

decision tree, diabetes, random forest

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

How to Mitigate Overfitting by Creating Ensembles

#artificialintelligenceOct-1-2021, 12:35:38 GMT

If we summarize what we've done so far in the "Addressing the problem of overfitting" article series, we've discussed three different techniques that can be used to mitigate overfitting. As you already know, Cross-validation (discussed in Part 1), Regularization (discussed in Part 2) and Dimensionality Reduction (discussed in Part 3) can effectively mitigate overfitting. In Part 4, today we discuss another useful technique called Creating Ensembles. However, this technique is limited to tree-based models. Someone can attempt to build a decision tree model (Step 1) without limiting the tree growth (without early stopping or without doing any hyperparameter tuning).

creating ensemble, decision tree, random forest, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback