AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

WildWood: a new Random Forest algorithm

Gaïffas, Stéphane, Merad, Ibrahim, Yu, Yiyang

arXiv.org Machine LearningSep-16-2021

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

algorithm, node, prediction, (16 more...)

arXiv.org Machine Learning

2109.0801

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Comparing decision mining approaches with regard to the meaningfulness of their results

Scheibel, Beate, Rinderle-Ma, Stefanie

arXiv.org Artificial IntelligenceSep-15-2021

Decisions and the underlying rules are indispensable for driving process execution during runtime, i.e., for routing process instances at alternative branches based on the values of process data. Decision rules can comprise unary data conditions, e.g., age > 40, binary data conditions where the relation between two or more variables is relevant, e.g. temperature1 < temperature2, and more complex conditions that refer to, for example, parts of a medical image. Decision discovery aims at automatically deriving decision rules from process event logs. Existing approaches focus on the discovery of unary, or in some instances binary data conditions. The discovered decision rules are usually evaluated using accuracy, but not with regards to their semantics and meaningfulness, although this is crucial for validation and the subsequent implementation/adaptation of the decision rules. Hence, this paper compares three decision mining approaches, i.e., two existing ones and one newly described approach, with respect to the meaningfulness of their results. For comparison, we use one synthetic data set for a realistic manufacturing case and the two real-world BPIC 2017/2020 logs. The discovered rules are discussed with regards to their semantics and meaningfulness.

accuracy, data condition, decision rule, (15 more...)

arXiv.org Artificial Intelligence

2109.07335

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Austria > Vienna (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.35)

Add feedback

Building Accurate Simple Models with Multihop

Dhurandhar, Amit, Pedapati, Tejaswini

arXiv.org Artificial IntelligenceSep-14-2021

Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.

complex model, intermediate model, simple model, (16 more...)

arXiv.org Artificial Intelligence

2109.06961

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)

Add feedback

07 -- Hands On ML -- Ensemble

#artificialintelligenceSep-12-2021, 19:11:05 GMT

Ensemble Learning is taking the predictions of multiple models and assume the output to be having the most votes. When you train multiple Decision Trees each on some random sampling of the dataset and for predictions you take predictions of all the trees, the output class would be the class which gets the most votes. This approach is called Random Forest. Voting classifier is when you train the data on multiple classifier such as Logistic Regression, SVM, RF and other classifiers and the majority vote is the predicted output class ie hard classifier. Voting can also be taken as soft by taking argmax of the outputs.

classifier, ensemble, prediction, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.44)

Add feedback

Data Science Meets Combinatorics

#artificialintelligenceSep-12-2021, 00:55:19 GMT

As a lifelong computational scientist (and now data scientist) I have always been fascinated with numbers, especially lists and tables of things ( databases!). For example, I thought early in life that I wanted to be a Math major in college and study number theory so that I could learn all of the amazing ways to do cool stuff with numbers. But I also wanted to study the wonders of the Universe as an astronomer, so I went on to get a PhD in astrophysics! That career path allowed me to study and apply all of the disciplines that I enjoy: math, physics, astronomy, computational modeling, and data science! It was numbers all the time!

algorithm, application, cookie, (13 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Adler, Afek Ilay, Painsky, Amichai

arXiv.org Machine LearningSep-12-2021

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We show that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

categorical feature, fi measure, implementation, (14 more...)

arXiv.org Machine Learning

2109.05468

Country:

Oceania > New Zealand > North Island > Waikato > Hamilton (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.41)

Add feedback

A Neural Tangent Kernel Perspective of Infinite Tree Ensembles

Kanoh, Ryuichi, Sugiyama, Mahito

arXiv.org Machine LearningSep-10-2021

In practical situations, the ensemble tree model is one of the most popular models along with neural networks. A soft tree is one of the variants of a decision tree. Instead of using a greedy method for searching splitting rules, the soft tree is trained using a gradient method in which the whole splitting operation is formulated in a differentiable form. Although ensembles of such soft trees have been increasingly used in recent years, little theoretical work has been done for understanding their behavior. In this paper, by considering an ensemble of infinite soft trees, we introduce and study the Tree Neural Tangent Kernel (TNTK), which provides new insights into the behavior of the infinite ensemble of soft trees. Using the TNTK, we succeed in theoretically finding several non-trivial properties, such as the effect of the oblivious tree structure and the degeneracy of the TNTK induced by the deepening of the trees. Moreover, we empirically examine the performance of an ensemble of infinite soft trees using the TNTK.

ensemble, equation, tntk, (14 more...)

arXiv.org Machine Learning

2109.04983

Country:

Asia > Middle East > Jordan (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Understanding Random Forests For Machine Learning

#artificialintelligenceSep-9-2021, 01:40:17 GMT

It has an important place in machine learning to solve regression and classification problems. It is useful for producing results with a machine learning algorithm without hypermeter tuning. So what does hypermeter tuning mean?

hypermeter, machine learning, random forest

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Robust Optimal Classification Trees Against Adversarial Examples

Vos, Daniël, Verwer, Sicco

arXiv.org Artificial IntelligenceSep-8-2021

Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified attack models. We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation for decision trees with 0-1 loss. We propose such formulations in Mixed-Integer Linear Programming and Maximum Satisfiability, which widely available solvers can optimize. We also present a method that determines the upper bound on adversarial accuracy for any model using bipartite matching. Our experimental results demonstrate that the existing heuristics achieve close to optimal scores while ROCT achieves state-of-the-art scores.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2109.03857

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Machine Learning -- Beginners Guide to Random Forest Classifiers (The Code)

#artificialintelligenceSep-4-2021, 12:28:34 GMT

So if you haven't already checked it out, I have posted about the mathematics behind this machine learning technique. If this is the first time you're coming across this algorithm I recommend you give it a read before jumping into the code. Otherwise, we're going to jump right into it!

beginner guide, machine learning, random forest classifier

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback