AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Beware Default Random Forest Importances

#artificialintelligenceJun-7-2019, 15:28:26 GMT

Dependence numbers close to one indicate that the feature is completely predictable using the other features, which means it could be dropped without affecting accuracy. For example, the mean radius is extremely important in predicting mean perimeter and mean area, so we can probably drop those two. It also looks like radius error is important to predicting perimeter error and area error, so we can drop those last two. Mean and worst texture also appear to be dependent, so we can drop one of those too. Similarly, let's drop concavity error and fractal dimension error because compactness error seems to predict them well. Worst radius also predicts worst perimeter and worst area well.

artificial intelligence, beware default random forest importance, decision tree learning, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.45)

Add feedback

The Random Forest Algorithm

#artificialintelligenceJun-7-2019, 15:28:09 GMT

Random Forest is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms, because it's simplicity and the fact that it can be used for both classification and regression tasks. In this post, you are going to learn, how the random forest algorithm works and several other important things about it. Random Forest is a supervised learning algorithm. Like you can already see from it's name, it creates a forest and makes it somehow random.

artificial intelligence, machine learning, random forest, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Ensemble Pruning via Margin Maximization

Martinez, Waldyn

arXiv.org Machine LearningJun-7-2019

Ensemble models refer to methods that combine a typically large number of classifiers into a compound prediction. The output of an ensemble method is the result of fitting a base-learning algorithm to a given data set, and obtaining diverse answers by reweighting the observations or by resampling them using a given probabilistic selection. A key challenge of using ensembles in large-scale multidimensional data lies in the complexity and the computational burden associated with them. The models created by ensembles are often difficult, if not impossible, to interpret and their implementation requires more computational power than single classifiers. Recent research effort in the field has concentrated in reducing ensemble size, while maintaining their predictive accuracy. We propose a method to prune an ensemble solution by optimizing its margin distribution, while increasing its diversity. The proposed algorithm results in an ensemble that uses only a fraction of the original classifiers, with improved or similar generalization performance. We analyze and test our method on both synthetic and real data sets. The simulations show that the proposed method compares favorably to the original ensemble solutions and to other existing ensemble pruning methodologies.

algorithm, classifier, ensemble, (16 more...)

arXiv.org Machine Learning

1906.03247

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Ohio > Butler County > Oxford (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
(2 more...)

Add feedback

On the Current State of Research in Explaining Ensemble Performance Using Margins

Martinez, Waldyn, Gray, J. Brian

arXiv.org Machine LearningJun-7-2019

Other authors suggest that specific margin instances Forests (Breiman, 2001) and rotation forests (Rodriguez hold a clue to better generalization (Shen and Li, et al., 2006), create a set of weak classifiers from 2010; Wang et al., 2011, 2012). In this article, we design a base learning algorithm B, which are typically decision algorithms to empirically test whether the state of research trees, then combine the predictions from the classifiers in in the explanation of ensemble performance translates into the form of a weighted vote, to produce an improved prediction better performing algorithms. We do not question the theoretical compared to individual classifiers (Drucker et al., soundness of the generalization error bounds, but 1994; Dietterich, 2000; Breiman, 2001; Maclin and Opitz, simply test whether evidence suggests that better performing 2011). Upper bounds based on the sample margins of the ensemble algorithms can be derived from the practical ensemble provide some explanation on why ensembles perform interpretations of the bounds. In the next section we discuss as well as they do. Schapire et al. (1998) first pointed margins, the generalization error bounds based on the to margins as a key determinant of ensemble performance.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Machine Learning

1906.03123

Country: North America > United States > Alabama (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Fr\'echet random forests

Capitaine, Louis, Genuer, Robin, Thiébaut, Rodolphe

arXiv.org Machine LearningJun-4-2019

Random forests are a statistical learning method widely used in many areas of scientific research essentially for its ability to learn complex relationship between input and output variables and also its capacity to handle high-dimensional data. However, data are increasingly complex with repeated measures of omics, images leading to shapes, curves... Random forests method is not specifically tailored for them. In this paper, we introduce Fr\'echet trees and Fr\'echet random forests, which allow to manage data for which input and output variables take values in general metric spaces (which can be unordered). To this end, a new way of splitting the nodes of trees is introduced and the prediction procedures of trees and forests are generalized. Then, random forests out-of-bag error and variable importance score are naturally adapted. Finally, the method is studied in the special case of regression on curve shapes, both within a simulation study and a real dataset from an HIV vaccine trial.

chet random forest, input variable, node, (16 more...)

arXiv.org Machine Learning

1906.01741

Country: North America > United States > New York (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees

Renard, Xavier, Woloszko, Nicolas, Aigrain, Jonathan, Detyniecki, Marcin

arXiv.org Machine LearningJun-4-2019

Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. First experiments on FRED-MD, a macroeconomic database with 134 variables, show improvement in human-interpretability while accuracy and fidelity of the surrogate model are preserved.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Machine Learning

1906.01297

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.65)

Add feedback

A Novel Hyperparameter-free Approach to Decision Tree Construction that Avoids Overfitting by Design

Leiva, Rafael Garcia, Anta, Antonio Fernandez, Mancuso, Vincenzo, Casari, Paolo

arXiv.org Artificial IntelligenceJun-4-2019

Decision trees are an extremely popular machine learning technique. Unfortunately, overfitting in decision trees still remains an open issue that sometimes prevents achieving good performance. In this work, we present a novel approach for the construction of decision trees that avoids the overfitting by design, without losing accuracy. A distinctive feature of our algorithm is that it requires neither the optimization of any hyperparameters, nor the use of regularization techniques, thus significantly reducing the decision tree training time. Moreover, our algorithm produces much smaller and shallower trees than traditional algorithms, facilitating the interpretability of the resulting models.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1906.01246

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.48)

Add feedback

Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019

Groll, Andreas, Ley, Christophe, Schauberger, Gunther, Van Eetvelde, Hans, Zeileis, Achim

arXiv.org Machine LearningJun-3-2019

In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates adequate ability parameters that reflect the current strength of the teams best. The proposed combined approach is then applied to the data from the two previous FIFA Women's World Cups 2011 and 2015. Finally, based on the resulting estimates, the FIFA Women's World Cup 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors the defending champion USA before the host France.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1906.01131

Country:

Europe > France (0.26)
Europe > Austria > Vienna (0.14)
North America > Canada (0.05)
(38 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)

Add feedback

The FacT: Taming Latent Factor Models for Explainability with Factorization Trees

Tao, Yiyi, Jia, Yiling, Wang, Nan, Wang, Hongning

arXiv.org Machine LearningJun-3-2019

Latent factor models have achieved great success in personalized recommendations, but they are also notoriously difficult to explain. In this work, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors. Specifically, we build regression trees on users and items respectively with user-generated reviews, and associate a latent profile to each node on the trees to represent users and items. With the growth of regression tree, the latent factors are gradually refined under the regularization imposed by the tree structure. As a result, we are able to track the creation of latent profiles by looking into the path of each factor on regression trees, which thus serves as an explanation for the resulting recommendations. Extensive experiments on two large collections of Amazon and Yelp reviews demonstrate the advantage of our model over several competitive baseline algorithms. Besides, our extensive user study also confirms the practical value of explainable recommendations generated by our model.

artificial intelligence, expert system, machine learning, (20 more...)

arXiv.org Machine Learning

1906.02037

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.34)

Industry: Consumer Products & Services > Restaurants (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

The Complete Guide to Decision Trees (part 2)

#artificialintelligenceJun-2-2019, 19:58:07 GMT

Now you may ask yourself: how do DTs know which features to select and how to split the data? To understand that, we need to get into some details. All DTs perform basically the same task: they examine all the attributes of the dataset to find the ones that give the best possible result by splitting the data into subgroups. They perform this task recursively by splitting subgroups into smaller and smaller units until the Tree is finished (stopped by certain criteria). This decision of making splits heavily affects the Tree's accuracy and performance, and for that decision, DTs can use different algorithms that differ in the possible structure of the Tree (e.g. the number of splits per node), the criteria on how to perform the splits, and when to stop splitting.

artificial intelligence, dts, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback