AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

Mentch, Lucas, Zhou, Siyu

arXiv.org Machine LearningMar-7-2020

As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like random forest have an established track record of off-the-shelf success and even offer various strategies for analyzing the underlying relationships between features and the response. Motivated by recent insights into random forest behavior, here we introduce the idea of augmented bagging (AugBagg), a procedure that operates in an identical fashion to the classical bagging and random forest counterparts but which operates on a larger space containing additional, randomly generated features. Somewhat surprisingly, we demonstrate that the simple act of adding additional random features into the model can have a dramatic beneficial effect on performance, sometimes outperforming even an optimally tuned traditional random forest. This finding that the inclusion of an additional set of features generated independently of the response can considerably improve predictive performance has crucial implications for the manner in which we consider and measure variable importance. Numerous demonstrations on both real and synthetic data are provided.

noise feature, procedure, random forest, (13 more...)

arXiv.org Machine Learning

2003.03629

Country: Oceania > Australia > Tasmania (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Interactive Decision Tree Software for Customer Service

#artificialintelligenceMar-6-2020, 06:39:05 GMT

Brands today have detailed processes with dynamic changes to serve millennials across the globe. As a CX Head, it is important to use a decision tree software to ensure that the right information is given to customers across assisted and digital channels. Knowmax is a decision tree software that helps you create workflows and publish them across all customer touchpoints using an interactive UI backed with robust analytics.

customer service, interactive decision tree software

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data

Yang, Jiabei, Dahabreh, Issa J., Steingrimsson, Jon A.

arXiv.org Machine LearningMar-6-2020

We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We derive properties of three subgroup-specific treatment effect estimators that account for the observational nature of the data -- inverse probability weighting, g-formula and doubly robust estimators. We study the performance of the proposed algorithms using simulations and implement the algorithms in an observational study that evaluates the effectiveness of right heart catheterization on critically ill patients.

algorithm, estimator, tree algorithm, (14 more...)

arXiv.org Machine Learning

2003.03042

Country: North America > Greenland (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)
Research Report > Strength High (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Unbiased variable importance for random forests

Loecher, Markus

arXiv.org Machine LearningMar-4-2020

The default variable-importance measure in random Forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an overfitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.

importance measure, impurity, oob, (15 more...)

arXiv.org Machine Learning

2003.02106

Country: Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

A review of machine learning applications in wildfire science and management

Jain, Piyush, Coogan, Sean C P, Subramanian, Sriram Ganapathi, Crowley, Mark, Taylor, Steve, Flannigan, Mike D

arXiv.org Machine LearningMar-1-2020

Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improve awareness of ML among wildfire scientists and managers, as well as illustrate the challenging range of problems in wildfire science available to data scientists. We first present an overview of popular ML approaches used in wildfire science to date, and then review their use in wildfire science within six problem domains: 1) fuels characterization, fire detection, and mapping; 2) fire weather and climate change; 3) fire occurrence, susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6) fire management. We also discuss the advantages and limitations of various ML approaches and identify opportunities for future advances in wildfire science and management within a data science context. We identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. There exists opportunities to apply more current ML methods (e.g., deep learning and agent based learning) in wildfire science. However, despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods requires sophisticated knowledge for their application. Finally, we stress that the wildfire research and management community plays an active role in providing relevant, high quality data for use by practitioners of ML methods.

agricultural and forest meteorology, classification and regression problem, geoscience and remote sensing letter, (16 more...)

arXiv.org Machine Learning

2003.00646

Country:

Asia > China > Fujian Province (0.14)
North America > United States > California > San Mateo County > San Mateo (0.13)
Europe > Greece (0.04)
(84 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.68)

Industry:

Energy (1.00)
Law Enforcement & Public Safety > Fire & Emergency Services (0.93)
Government > Regional Government > North America Government > United States Government (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(10 more...)

Add feedback

Online Hierarchical Forecasting for Power Consumption Data

Brégère, Margaux, Huard, Malo

arXiv.org Machine LearningMar-1-2020

We study the forecasting of the power consumptions of a population of households and of subpopulations thereof. These subpopulations are built according to location, to exogenous information and/or to profiles we determined from historical households consumption time series. Thus, we aim to forecast the electricity consumption time series at several levels of households aggregation. These time series are linked through some summation constraints which induce a hierarchy. Our approach consists in three steps: feature generation, aggregation and projection. Firstly (feature generation step), we build, for each considering group for households, a benchmark forecast (called features), using random forests or generalized additive models. Secondly (aggregation step), aggregation algorithms, run in parallel, aggregate these forecasts and provide new predictions. Finally (projection step), we use the summation constraints induced by the time series underlying hierarchy to re-conciliate the forecasts by projecting them in a well-chosen linear subspace. We provide some theoretical guaranties on the average prediction error of this methodology, through the minimization of a quantity called regret. We also test our approach on households power consumption data collected in Great Britain by multiple energy providers in the Energy Demand Research Project context. We build and compare various population segmentations for the evaluation of our approach performance.

algorithm, consumption, forecast, (15 more...)

arXiv.org Machine Learning

2003.00585

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Modeling & Simulation (0.91)
(2 more...)

Add feedback

Random Forest Algorithm for Machine Learning

#artificialintelligenceFeb-29-2020, 13:09:19 GMT

Have you ever asked yourself a series of questions in order to help make a final decision on something? Maybe it was a simple decision like what you wanted to eat for dinner. You might have asked yourself if you wanted to cook or pick food up or get delivery. If you decided to cook, then you would have needed to figure out what type of cuisine you were in the mood for. And lastly, you probably needed to figure out if you had all of the ingredients in your fridge or needed to make a run to the store.

artificial intelligence, decision tree, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.55)

Add feedback

Decision Trees Overfitting and Pruning

#artificialintelligenceFeb-29-2020, 00:35:22 GMT

In this video, we will discuss practical considerations in designing a decision tree model. We will discuss how to overcome overfitting in decision trees, three ways to prune the tree, and how to handle missing attributes and continuous values. This channel is part of CSEdu4All, an educational initiative that aims to make computer science education accessible to all! We believe that everyone has the right to good education, and geographical and political boundaries should not be a barrier to obtaining knowledge and information. We hope that you will join and support us in this endeavor!

csedu4all, decision tree overfitting and pruning, video

#artificialintelligence

Industry: Education (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Decision Trees for Decision-Making under the Predict-then-Optimize Framework

Elmachtoub, Adam N., Liang, Jason Cheuk Nam, McNellis, Ryan

arXiv.org Machine LearningFeb-29-2020

We consider the use of decision trees for decision-making problems under the predict-then-optimize framework. That is, we would like to first use a decision tree to predict unknown input parameters of an optimization problem, and then make decisions by solving the optimization problem using the predicted parameters. A natural loss function in this framework is to measure the suboptimality of the decisions induced by the predicted input parameters, as opposed to measuring loss using input parameter prediction error. This natural loss function is known in the literature as the Smart Predict-then-Optimize (SPO) loss, and we propose a tractable methodology called SPO Trees (SPOTs) for training decision trees under this loss. SPOTs benefit from the interpretability of decision trees, providing an interpretable segmentation of contextual features into groups with distinct optimal solutions to the optimization problem of interest. We conduct several numerical experiments on synthetic and real data including the prediction of travel times for shortest path problems and predicting click probabilities for news article recommendation. We demonstrate on these datasets that SPOTs simultaneously provide higher quality decisions and significantly lower model complexity than other machine learning approaches (e.g., CART) trained to minimize prediction error.

decision tree, decision-making, optimization problem, (15 more...)

arXiv.org Machine Learning

2003.0036

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Deep differentiable forest with sparse attention for the tabular data

Chen, Yingshi

arXiv.org Machine LearningFeb-29-2020

We present a general architecture of deep differentiable forest and its sparse attention mechanism. The differentiable forest has the advantages of both trees and neural networks. Its structure is a simple binary tree, easy to use and understand. It has full differentiability and all variables are learnable parameters. We would train it by the gradient-based optimization method, which shows great power in the training of deep CNN. We find and analyze the attention mechanism in the differentiable forest. That is, each decision depends on only a few important features, and others are irrelevant. The attention is always sparse. Based on this observation, we improve its sparsity by data-aware initialization. We use the attribute importance to initialize the attention weight. Then the learned weight is much sparse than that from random initialization. Our experiment on some large tabular dataset shows differentiable forest has higher accuracy than GBDT, which is the state of art algorithm for tabular datasets. The source codes are available at https://github.com/closest-git/QuantumForest

arxiv preprint arxiv, dataset, decision tree, (14 more...)

arXiv.org Machine Learning

2003.00223

Country: Asia > China > Fujian Province > Xiamen (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback