AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Log-based Evaluation of Label Splits for Process Models

Tax, Niek, Sidorova, Natalia, Haakma, Reinder, van der Aalst, Wil M. P.

arXiv.org Artificial IntelligenceJun-23-2016

Process mining techniques aim to extract insights in processes from event logs. One of the challenges in process mining is identifying interesting and meaningful event labels that contribute to a better understanding of the process. Our application area is mining data from smart homes for elderly, where the ultimate goal is to signal deviations from usual behavior and provide timely recommendations in order to extend the period of independent living. Extracting individual process models showing user behavior is an important instrument in achieving this goal. However, the interpretation of sensor data at an appropriate abstraction level is not straightforward. For example, a motion sensor in a bedroom can be triggered by tossing and turning in bed or by getting up. We try to derive the actual activity depending on the context (time, previous events, etc.). In this paper we introduce the notion of label refinements, which links more abstract event descriptions with their more refined counterparts. We present a statistical evaluation method to determine the usefulness of a label refinement for a given event log from a process perspective. Based on data from smart homes, we show how our statistical evaluation method for label refinements can be used in practice. Our method was able to select two label refinements out of a set of candidate label refinements that both had a positive effect on model precision.

event log, label refinement, mountain rd, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.procs.2016.08.096

1606.07259

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology > Smart Houses & Appliances (0.74)
Materials > Metals & Mining (0.66)
Health & Medicine > Health Care Providers & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science > Data Mining (0.71)
Information Technology > Information Management (0.67)
(2 more...)

Add feedback

Interpretable Machine Learning Models for the Digital Clock Drawing Test

Souillard-Mandar, William, Davis, Randall, Rudin, Cynthia, Au, Rhoda, Penney, Dana

arXiv.org Machine LearningJun-22-2016

The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular neuropsychological screening tool for cognitive conditions. The Digital Clock Drawing Test (dCDT) uses novel software to analyze data from a digitizing ballpoint pen that reports its position with considerable spatial and temporal precision, making possible the analysis of both the drawing process and final product. We developed methodology to analyze pen stroke data from these drawings, and computed a large collection of features which were then analyzed with a variety of machine learning techniques. The resulting scoring systems were designed to be more accurate than the systems currently used by clinicians, but just as interpretable and easy to use. The systems also allow us to quantify the tradeoff between accuracy and interpretability. We created automated versions of the CDT scoring systems currently used by clinicians, allowing us to benchmark our models, which indicated that our machine learning models substantially outperformed the existing scoring systems.

cognitive impairment, digital clock, interpretable machine learning model, (10 more...)

arXiv.org Machine Learning

1606.07163

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

An effective approach for classification of advanced malware with high accuracy

Sharma, Ashu, Sahay, Sanjay K.

arXiv.org Artificial IntelligenceJun-22-2016

Combating malware is very important for software/systems security, but to prevent the software/systems from the advanced malware, viz. metamorphic malware is a challenging task, as it changes the structure/code after each infection. Therefore in this paper, we present a novel approach to detect the advanced malware with high accuracy by analyzing the occurrence of opcodes (features) by grouping the executables. These groups are made on the basis of our earlier studies [1] that the difference between the sizes of any two malware generated by popular advanced malware kits viz. PS-MPC, G2 and NGVCK are within 5 KB. On the basis of obtained promising features, we studied the performance of thirteen classifiers using N-fold cross-validation available in machine learning tool WEKA. Among these thirteen classifiers we studied in-depth top five classifiers (Random forest, LMT, NBT, J48 and FT) and obtain more than 96.28% accuracy for the detection of unknown malware, which is better than the maximum detection accuracy (95.9%) reported by Santos et al (2013). In these top five classifiers, our approach obtained a detection accuracy of 97.95% by the Random forest.

artificial intelligence, machine learning, malware, (18 more...)

arXiv.org Artificial Intelligence

1606.06897

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

S0264837716305440

#artificialintelligenceJun-18-2016, 06:33:29 GMT

In this work, we particularly focus on the complex relationship between land-use and transport offering an innovative approach to the problem by using land-use features at two differing levels of granularity (the more general land-use sector types and the more granular amenity structures) to evaluate their impact on public transit ridership in both time and space. To quantify the interdependencies, we explored three machine learning models and demonstrate that the decision tree model performs best in terms of overall performance--good predictive accuracy, generality, computational efficiency, and "interpretability". We then demonstrate how the developed framework can be applied to urban planning for transit-oriented development by exploring practicable scenarios based on Singapore's urban plan toward 2030, which includes the development of "regional centers" (RCs) across the city-state. This trend, on the other hand, eventually reverses (particularly during peak hours) with continued strategic increase in amenities; a tipping point at 55% increase is identified where ridership begins to decline and at 110%, the predicted ridership begins to fall below current levels.

artificial intelligence, decision tree learning, ridership, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)

Add feedback

Impacts of land use and amenities on public transport use, urban planning and design

#artificialintelligenceJun-18-2016, 06:33:29 GMT

Various land-use configurations are known to have wide-ranging effects on the dynamics of and within other city components including the transportation system. In this work, we particularly focus on the complex relationship between land-use and transport offering an innovative approach to the problem by using land-use features at two differing levels of granularity (the more general land-use sector types and the more granular amenity structures) to evaluate their impact on public transit ridership in both time and space. To quantify the interdependencies, we explored three machine learning models and demonstrate that the decision tree model performs best in terms of overall performance--good predictive accuracy, generality, computational efficiency, and "interpretability". Results also reveal that amenity-related features are better predictors than the more general ones, which suggests that high-resolution geo-information can provide more insights into the dependence of transit ridership on land-use. We then demonstrate how the developed framework can be applied to urban planning for transit-oriented development by exploring practicable scenarios based on Singapore's urban plan toward 2030, which includes the development of "regional centers" (RCs) across the city-state.

artificial intelligence, decision tree learning, machine learning, (8 more...)

#artificialintelligence

Country: Asia > Singapore (0.30)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (1.00)
Law > Real Estate Law (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)

Add feedback

Rise Of Automated Trading: Machines Trading S&P 500

#artificialintelligenceJun-17-2016, 08:20:34 GMT

Putting it all together, the following example shows the equity curve representing cumulative returns of the model strategy, with all values expressed in dollars. To increase the precision of forecasted values, instead of a standard probability of 0.5 (50 percent) we choose a higher threshold value, to be more confident that the model predicts an Up day. As we can see by the chart above, the equity curve is much better than before (Sharpe is 6.5 instead of 3.5), even with fewer round turns. From this point on, we will consider all next models with a threshold higher than a standard value. We can apply our research, as we did previously with the decision tree, into a Logistic Classifier model.

artificial intelligence, decision tree learning, machine learning, (16 more...)

#artificialintelligence

Country: North America > United States > New York > New York County > New York City (0.04)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.35)

Add feedback

The Effect of Heteroscedasticity on Regression Trees

Ruth, Will, Loughin, Thomas

arXiv.org Machine LearningJun-16-2016

Regression trees are becoming increasingly popular as omnibus predicting tools and as the basis of numerous modern statistical learning ensembles. Part of their popularity is their ability to create a regression prediction without ever specifying a structure for the mean model. However, the method implicitly assumes homogeneous variance across the entire explanatory-variable space. It is unknown how the algorithm behaves when faced with heteroscedastic data. In this study, we assess the performance of the most popular regression-tree algorithm in a single-variable setting under a very simple step-function model for heteroscedasticity. We use simulation to show that the locations of splits, and hence the ability to accurately predict means, are both adversely influenced by the change in variance. We identify the pruning algorithm as the main concern, although the effects on the splitting algorithm may be meaningful in some applications.

dataset, regression tree, variance, (14 more...)

arXiv.org Machine Learning

1606.05273

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

ACDC: $\alpha$-Carving Decision Chain for Risk Stratification

Park, Yubin, Ho, Joyce, Ghosh, Joydeep

arXiv.org Machine LearningJun-16-2016

In many healthcare settings, intuitive decision rules for risk stratification can help effective hospital resource allocation. This paper introduces a novel variant of decision tree algorithms that produces a chain of decisions, not a general tree. Our algorithm, $\alpha$-Carving Decision Chain (ACDC), sequentially carves out "pure" subsets of the majority class examples. The resulting chain of decision rules yields a pure subset of the minority class examples. Our approach is particularly effective in exploring large and class-imbalanced health datasets. Moreover, ACDC provides an interactive interpretation in conjunction with visual performance metrics such as Receiver Operating Characteristics curve and Lift chart.

acdc, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1606.05325

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.50)
Health & Medicine > Therapeutic Area > Nephrology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.55)

Add feedback

Making Tree Ensembles Interpretable

Hara, Satoshi, Hayashi, Kohei

arXiv.org Machine LearningJun-16-2016

Tree ensembles, such as random forest and boosted trees, are renowned for their high prediction performance, whereas their interpretability is critically limited. In this paper, we propose a post processing method that improves the model interpretability of tree ensembles. After learning a complex tree ensembles in a standard way, we approximate it by a simpler model that is interpretable for human. To obtain the simpler model, we derive the EM algorithm minimizing the KL divergence from the complex ensemble. A synthetic experiment showed that a complicated tree ensemble was approximated reasonably as interpretable.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Machine Learning

1606.0539

Country: Asia (0.15)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

Add feedback

Pruning Random Forests for Prediction on a Budget

Nan, Feng, Wang, Joseph, Saligrama, Venkatesh

arXiv.org Machine LearningJun-16-2016

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.

artificial intelligence, machine learning, survey article, (21 more...)

arXiv.org Machine Learning

1606.0506

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
(2 more...)

Add feedback