AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Contextual Care Protocol using Neural Networks and Decision Trees

Sinha, Yash Pratyush, Malviya, Pranshu, Panda, Minerva, Ali, Syed Mohd

arXiv.org Machine LearningNov-15-2018

A contextual care protocol is used by a medical practitioner for patient healthcare, given the context or situation that the specified patient is in. This paper proposes a method to build an automated self-adapting protocol which can help make relevant, early decisions for effective healthcare delivery. The hybrid model leverages neural networks and decision trees. The neural network estimates the chances of each disease and each tree in the decision trees represents care protocol for a disease. These trees are subject to change in case of aberrations found by the diagnosticians. These corrections or prediction errors are clustered into similar groups for scalability and review by the experts. The corrections as suggested by the experts are incorporated into the model.

artificial intelligence, decision tree, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/ICAECC.2018.8479433

1811.06437

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Machine Learning Analysis of Heterogeneity in the Effect of Student Mindset Interventions

Johansson, Fredrik D.

arXiv.org Machine LearningNov-14-2018

We study heterogeneity in the effect of a mindset intervention on student-level performance through an observational dataset from the National Study of Learning Mindsets (NSLM). Our analysis uses machine learning (ML) to address the following associated problems: assessing treatment group overlap and covariate balance, imputing conditional average treatment effects, and interpreting imputed effects. By comparing several different model families we illustrate the flexibility of both off-the-shelf and purpose-built estimators. We find that the mindset intervention has a positive average effect of 0.26, 95%-CI [0.22, 0.30], and that heterogeneity in the range of [0.1, 0.4] is moderated by school-level achievement level, poverty concentration, urbanicity, and student prior expectations.

artificial intelligence, heterogeneity, machine learning, (18 more...)

arXiv.org Machine Learning

1811.05975

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

THORS: An Efficient Approach for Making Classifiers Cost-sensitive

Tian, Ye, Zhang, Weiping

arXiv.org Machine LearningNov-7-2018

In this paper, we propose an effective TH resholding method based on ORder S tatistic, called THORS, to convert an arbitrary scoring-type classifier, which can induce a continuous cumulative distribution function of the score, into a cost-sensitive one. The procedure, uses order statistic to find an optimal threshold for classification, requiring almost no knowledge of classifiers itself. Unlike common data-driven methods, we analytically show that THORS has theoretical guaranteed performance, theoretical bounds for the costs and lower time complexity. Coupled with empirical results on several real-world data sets, we argue that THORS is the preferred cost-sensitive technique. Key words: Classification; Cost-sensitive learning; Imbalanced data set; Statistical learning; Threshold adjusting.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1811.02814

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(4 more...)

Add feedback

FairMod - Making Predictive Models Discrimination Aware

Liu, Jixue, Li, Jiuyong, Liu, Lin, Le, Thuc Duy, Ye, Feiyue, Li, Gefei

arXiv.org Artificial IntelligenceNov-4-2018

Predictive models such as decision trees and neural networks may produce discrimination in their predictions. This paper proposes a method to post-process the predictions of a predictive model to make the processed predictions non-discriminatory. The method considers multiple protected variables together. Multiple protected variables make the problem more challenging than a simple protected variable. The method uses a well-cited discrimination metric and adapts it to allow the specification of explanatory variables, such as position, profession, education, that describe the contexts of the applications. It models the post-processing of predictions problem as a nonlinear optimization problem to find best adjustments to the predictions so that the discrimination constraints of all protected variables are all met at the same time. The proposed method is independent of classification methods. It can handle the cases that existing methods cannot handle: satisfying multiple protected attributes at the same time, allowing multiple explanatory attributes, and being independent of classification model types. An evaluation using four real world data sets shows that the proposed method is as effectively as existing methods, in addition to its extra power.

data mining, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

1811.0148

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.83)

Industry: Banking & Finance > Insurance (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Modeling Stated Preference for Mobility-on-Demand Transit: A Comparison of Machine Learning and Logit Models

Zhao, Xilei, Yan, Xiang, Yu, Alan, Van Hentenryck, Pascal

arXiv.org Artificial IntelligenceNov-3-2018

Logit models are usually applied when studying individual travel behavior, i.e., to predict travel mode choice and to gain behavioral insights on traveler preferences. Recently, some studies have applied machine learning to model travel mode choice and reported higher out-of-sample prediction accuracy than conventional logit models (e.g., multinomial logit). However, there has not been a comprehensive comparison between logit models and machine learning that covers both prediction and behavioral analysis. This paper aims at addressing this gap by examining the key differences in model development, evaluation, and behavioral interpretation between logit and machine-learning models for travel-mode choice modeling. To complement the theoretical discussions, we also empirically evaluated the two approaches on stated-preference survey data for a new type of transit system integrating high-frequency fixed routes and micro-transit. The results show that machine learning can produce significantly higher predictive accuracy than logit models and are better at capturing the nonlinear relationships between trip attributes and mode-choice outcomes. On the other hand, compared to the multinomial logit model, the best-performing machine-learning model, the random forest model, produces less reasonable behavioral outputs (i.e. marginal effects and elasticities) when they were computed from a standard approach. By introducing some behavioral constraints into the computation of behavioral outputs from a random forest model, however, we obtained better results that are somewhat comparable with the multinomial logit model. We believe that there is great potential in merging ideas from machine learning and conventional statistical methods to develop refined models for travel-behavior research and suggest some possible research directions.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1811.01315

Country: North America > United States > New York (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Risk-Stratify: Confident Stratification Of Patients Based On Risk

Ahuja, Kartik, van der Schaar, Mihaela

arXiv.org Machine LearningNov-2-2018

A clinician desires to use a risk-stratification method that achieves confident risk-stratification - the risk estimates of the different patients reflect the true risks with a high probability. This allows him/her to use these risks to make accurate predictions about prognosis and decisions about screening, treatments for the current patient. We develop Risk-stratify - a two phase algorithm that is designed to achieve confident risk-stratification. In the first phase, we grow a tree to partition the covariate space. Each node in the tree is split using statistical tests that determine if the risks of the child nodes are different or not. The choice of the statistical tests depends on whether the data is censored (Log-rank test) or not (U-test). The set of the leaves of the tree form a partition. The risk distribution of patients that belong to a leaf is different from the sibling leaf but not the rest of the leaves. Therefore, some of the leaves that have similar underlying risks are incorrectly specified to have different risks. In the second phase, we develop a novel recursive graph decomposition approach to address this problem. We merge the leaves of the tree that have similar risks to form new leaves that form the final output. We apply Risk-stratify on a cohort of patients (with no history of cardiovascular disease) from UK Biobank and assess their risk for cardiovascular disease. Risk-stratify significantly improves risk-stratification, i.e., a lower fraction of the groups have over/under estimated risks (measured in terms of false discovery rate; 33% reduction) in comparison to state-of-the-art methods for cardiovascular prediction (Random forests, Cox model, etc.). We find that the Cox model significantly over estimates the risk of 21,621 patients out of 216,211 patients. Risk-stratify can accurately categorize 2,987 of these 21,621 patients as low-risk individuals.

artificial intelligence, machine learning, partition, (19 more...)

arXiv.org Machine Learning

1811.00753

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

Add feedback

Machine Learning Basics - Random Forest

#artificialintelligenceOct-31-2018, 23:27:27 GMT

RF is based on decision trees. In machine learning decision trees are a technique for creating predictive models. They are called decision trees because the prediction follows several branches of "if… then…" decision splits - similar to the branches of a tree. If we imagine that we start with a sample, which we want to predict a class for, we would start at the bottom of a tree and travel up the trunk until we come to the first split-off branch. This split can be thought of as a feature in machine learning, let's say it would be "age"; we would now make a decision about which branch to follow: "if our sample has an age bigger than 30, continue along the left branch, else continue along the right branch".

artificial intelligence, decision tree learning, machine learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Computational Intelligence in Sports: A Systematic Literature Review

Bonidia, Robson P., Rodrigues, Luiz A. L., Avila-Santos, Anderson P., Sanches, Danilo S., Brancher, Jacques D.

arXiv.org Artificial IntelligenceOct-30-2018

Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.

data mining, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1810.1285

Country:

South America > Brazil (0.46)
North America > United States (0.46)
Europe > United Kingdom > England (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (1.00)
Leisure & Entertainment > Sports > Soccer (0.46)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(4 more...)

Add feedback

How to Interpret a Random Forest Model (Machine Learning with Python)

#artificialintelligenceOct-29-2018, 06:25:30 GMT

Machine Learning is a fast evolving field – but a few things would remain as they were years ago. One such thing is ability to interpret and explain your machine learning models. If you build a model and can not explain it to your business users – it is very unlikely that it will see the light of the day. Can you imagine integrating a model into your product without understanding how it works? Or which features are impacting your final result? In addition to backing from stakeholders, we as data scientists benefit from interpreting our work and improving upon it. The first article of this fast.ai I'm delighted to share part 2 of this series, which primarily deals with how you can intepret a random forest model. We will understand the theory and also implement it in Python to solidify our grasp on this critical concept.

artificial intelligence, machine learning, prediction, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.65)

Add feedback

Dealing with Uncertain Inputs in Regression Trees

Tami, Myriam, Clausel, Marianne, Devijver, Emilie, Dulac, Adrien, Gaussier, Eric, Janaqi, Stefan, Chebre, Meriam

arXiv.org Machine LearningOct-27-2018

Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty in the output variable, using for example a quantile loss in Random Forests (Meinshausen, 2006). To the best of our knowledge, no extension has been provided yet for dealing with uncertainties in the input variables, even though such uncertainties are common in practical situations. We propose here such an extension by showing how standard regression trees optimizing a quadratic loss can be adapted and learned while taking into account the uncertainties in the input. By doing so, one no longer assumes that an observation lies into a single region of the regression tree, but rather that it belongs to each region with a certain probability. Experiments conducted on several data sets illustrate the good behavior of the proposed extension.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

1810.11698

Country:

Europe > France (0.31)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback