AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Optimal Resampling for Learning Small Models

Ghose, Abhishek, Ravindran, Balaraman

arXiv.org Machine LearningMay-4-2019

Models often need to be constrained to a certain size for them to be considered interpretable, for e.g., a decision tree of depth 5 is much easier to make sense of than one of depth 30. This suggests a trade-off between interpretability and accuracy. Our work tries to minimize this trade-off by suggesting the optimal distribution of the data to learn from, that surprisingly, may be different from the original distribution. We use an Infinite Beta Mixture Model (IBMM) to represent a specific set of sampling schemes. The parameters of the IBMM are learned using a Bayesian Optimizer (BO). While even under simplistic assumptions a distribution in the original $d$-dimensional space would need to optimize for $O(d)$ variables - cumbersome for most real-world data - our technique lowers this number significantly to a fixed set of 8 variables at the cost of some additional preprocessing. The proposed technique is \emph{model-agnostic}; it can be applied to any classifier. It also admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models.

decision tree learning, density tree, health & medicine, (21 more...)

arXiv.org Machine Learning

1905.0152

Country:

North America > United States > New York (0.14)
Oceania > Australia (0.14)
North America > United States > Wisconsin (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Matlab vs. OpenCV: A Comparative Study of Different Machine Learning Algorithms

Elsayed, Ahmed A., Yousef, Waleed A.

arXiv.org Machine LearningMay-3-2019

Scientific Computing relies on executing computer algorithms coded in some programming languages. Given a particular available hardware, algorithms speed is a crucial factor. There are many scientific computing environments used to code such algorithms. Matlab is one of the most tremendously successful and widespread scientific computing environments that is rich of toolboxes, libraries, and data visualization tools. OpenCV is a (C++)-based library written primarily for Computer Vision and its related areas. This paper presents a comparative study using 20 different real datasets to compare the speed of Matlab and OpenCV for some Machine Learning algorithms. Although Matlab is more convenient in developing and data presentation, OpenCV is much faster in execution, where the speed ratio reaches more than 80 in some cases. The best of two worlds can be achieved by exploring using Matlab or similar environments to select the most successful algorithm; then, implementing the selected algorithm using OpenCV or similar environments to gain a speed factor.

artificial intelligence, machine learning, programming language, (17 more...)

arXiv.org Machine Learning

1905.01213

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Add feedback

Interpretable multiclass classification by MDL-based rule lists

Proença, Hugo M., van Leeuwen, Matthijs

arXiv.org Artificial IntelligenceMay-1-2019

Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion.

artificial intelligence, machine learning, rule list, (19 more...)

arXiv.org Artificial Intelligence

1905.00328

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Metals & Mining (0.34)
Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(4 more...)

Add feedback

Factor Analysis in Fault Diagnostics Using Random Forest

Amruthnath, Nagdev, Gupta, Tarun

arXiv.org Machine LearningApr-30-2019

Factor analysis or sometimes referred to as variable analysis has been extensively used in classification problems for identifying specific factors that are significant to particular classes. This type of analysis has been widely used in application such as customer segmentation, medical research, network traffic, image, and video classification. Today, factor analysis is prominently being used in fault diagnosis of machines to identify the significant factors and to study the root cause of a specific machine fault. The advantage of performing factor analysis in machine maintenance is to perform prescriptive analysis (helps answer what actions to take?) and preemptive analysis (helps answer how to eliminate the failure mode?). In this paper, a real case of an industrial rotating machine was considered where vibration and ambient temperature data was collected for monitoring the health of the machine. Gaussian mixture model-based clustering was used to cluster the data into significant groups, and spectrum analysis was used to diagnose each cluster to a specific state of the machine. The significant features that attribute to a particular mode of the machine were identified by using the random forest classification model. The significant features for specific modes of the machine were used to conclude that the clusters generated are distinct and have a unique set of significant features.

artificial intelligence, factor analysis, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.4172/2169-0316.1000278

1904.13366

Country:

North America > United States > Michigan > Kalamazoo County > Kalamazoo (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Wisconsin (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)

Add feedback

Why should you trust my interpretation? Understanding uncertainty in LIME predictions

Fen, Hui, Tan, null, Song, Kuangyan, Udell, Madeilene, Sun, Yiming, Zhang, Yujia

arXiv.org Artificial IntelligenceApr-29-2019

Methods for interpreting machine learning black-box models increase the outcomes' transparency and in turn generates insight into the reliability and fairness of the algorithms. However, the interpretations themselves could contain significant uncertainty that undermines the trust in the outcomes and raises concern about the model's reliability. Focusing on the method "Local Interpretable Model-agnostic Explanations" (LIME), we demonstrate the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points. Such uncertainty is present even in models with high training and test accuracy. We apply LIME to synthetic data and two public data sets, text classification in 20 Newsgroup and recidivism risk-scoring in COMPAS, to support our argument.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1904.12991

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Add feedback

Optimal Sparse Decision Trees

Hu, Xiyang, Rudin, Cynthia, Seltzer, Margo

arXiv.org Machine LearningApr-29-2019

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.

artificial intelligence, leaves, machine learning, (17 more...)

arXiv.org Machine Learning

1904.12847

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Asymmetric Impurity Functions, Class Weighting, and Optimal Splits for Binary Classification Trees

Zimmermann, David

arXiv.org Machine LearningApr-29-2019

We investigate how asymmetrizing an impurity function affects the choice of optimal node splits when growing a decision tree for binary classification. In particular, we relax the usual axioms of an impurity function and show how skewing an impurity function biases the optimal splits to isolate points of a particular class when splitting a node. We give a rigorous definition of this notion, then give a necessary and sufficient condition for such a bias to hold. We also show that the technique of class weighting is equivalent to applying a specific transformation to the impurity function, and tie all these notions together for a class of impurity functions that includes the entropy and Gini impurity. We also briefly discuss cost-insensitive impurity functions and give a characterization of such functions.

artificial intelligence, decision tree learning, machine learning, (20 more...)

arXiv.org Machine Learning

1904.12465

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Distributed Classification of Urban Congestion Using VANET

Ranwa, Al Mallah, Bilal, Farooq, Alejandro, Quintero

arXiv.org Machine LearningApr-26-2019

Vehicular Ad-hoc NETworks (VANET) can efficiently detect traffic congestion, but detection is not enough because congestion can be further classified as recurrent and non-recurrent congestion (NRC). In particular, NRC in an urban network is mainly caused by incidents, workzones, special events and adverse weather. We propose a framework for the real-time distributed classification of congestion into its components on a heterogeneous urban road network using VANET. We present models built on an understanding of the spatial and temporal causality measures and trained on synthetic data extended from a real case study of Cologne. Our performance evaluation shows a predictive accuracy of 87.63\% for the deterministic Classification Tree (CT), 88.83\% for the Naive Bayesian classifier (NB), 89.51\% for Random Forest (RF) and 89.17\% for the boosting technique. This framework can assist transportation agencies in reducing urban congestion by developing effective congestion mitigation strategies knowing the root causes of congestion.

artificial intelligence, congestion, machine learning, (20 more...)

arXiv.org Machine Learning

1904.12685

Country: North America > Canada (0.48)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.90)
Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Formal Verification of Decision-Tree Ensemble Model and Detection of its Violating-input-value Ranges

Sato, Naoto, Kuruma, Hironobu, Nakagawa, Yuichiroh, Ogawa, Hideto

arXiv.org Artificial IntelligenceApr-26-2019

As one type of machine-learning model, a "decision-tree ensemble model" (DTEM) is represented by a set of decision trees. A DTEM is mainly known to be valid for structured data; however, like other machine-learning models, it is difficult to train so that it returns the correct output value for any input value. Accordingly, when a DTEM is used in regard to a system that requires reliability, it is important to comprehensively detect input values that lead to malfunctions of a system (failures) during development and take appropriate measures. One conceivable solution is to install an input filter that controls the input to the DTEM, and to use separate software to process input values that may lead to failures. To develop the input filter, it is necessary to specify the filtering condition of the input value that leads to the malfunction of the system. Given that necessity, in this paper, we propose a method for formally verifying a DTEM and, according to the result of the verification, if an input value leading to a failure is found, extracting the range in which such an input value exists. The proposed method can comprehensively extract the range in which the input value leading to the failure exists; therefore, by creating an input filter based on that range, it is possible to prevent the failure occurring in the system. In this paper, the algorithm of the proposed method is described, and the results of a case study using a dataset of house prices are presented. On the basis of those results, the feasibility of the proposed method is demonstrated, and its scalability is evaluated.

artificial intelligence, machine learning, violation range, (17 more...)

arXiv.org Artificial Intelligence

1904.11753

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.83)

Add feedback

From Predictions to Prescriptions in Multistage Optimization Problems

Bertsimas, Dimitris, McCord, Christopher

arXiv.org Machine LearningApr-25-2019

In this paper, we introduce a framework for solving finite-horizon multistage optimization problems under uncertainty in the presence of auxiliary data. We assume the joint distribution of the uncertain quantities is unknown, but noisy observations, along with observations of auxiliary covariates, are available. We utilize effective predictive methods from machine learning (ML), including $k$-nearest neighbors regression ($k$NN), classification and regression trees (CART), and random forests (RF), to develop specific methods that are applicable to a wide variety of problems. We demonstrate that our solution methods are asymptotically optimal under mild conditions. Additionally, we establish finite sample guarantees for the optimality of our method with $k$NN weight functions. Finally, we demonstrate the practicality of our approach with computational examples. We see a significant decrease in cost by taking into account the auxiliary data in the multistage setting.

artificial intelligence, machine learning, weight function, (14 more...)

arXiv.org Machine Learning

1904.11637

Country: North America > United States > Massachusetts > Middlesex County (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback