Goto

Collaborating Authors

 Decision Tree Learning


Rectified Decision Trees: Towards Interpretability, Compression and Empirical Soundness

arXiv.org Machine Learning

How to obtain a model with good interpretability and performance has always been an important research topic. In this paper, we propose rectified decision trees (ReDT), a knowledge distillation based decision trees rectification with high interpretability, small model size, and empirical soundness. Specifically, we extend the impurity calculation and the pure ending condition of the classical decision tree to propose a decision tree extension that allows the use of soft labels generated by a well-trained teacher model in training and prediction process. It is worth noting that for the acquisition of soft labels, we propose a new multiple cross-validation based method to reduce the effects of randomness and overfitting. These approaches ensure that ReDT retains excellent interpretability and even achieves fewer nodes than the decision tree in the aspect of compression while having relatively good performance. Besides, in contrast to traditional knowledge distillation, back propagation of the student model is not necessarily required in ReDT, which is an attempt of a new knowledge distillation approach. Extensive experiments are conducted, which demonstrates the superiority of ReDT in interpretability, compression, and empirical soundness.


Want to know how to choose Machine Learning algorithm?

#artificialintelligence

Machine Learning is the foundation for today's insights on customer, products, costs and revenues which learns from the data provided to its algorithms. Some of the most common examples of machine learning are Netflix's algorithms to give movie suggestions based on movies you have watched in the past or Amazon's algorithms that recommend products based on other customers bought before. Decision Trees: Decision tree output is very easy to understand even for people from non-analytical background. It does not require any statistical knowledge to read and interpret them. Fastest way to identify most significant variables and relation between two or more variables.


Derisking machine learning and artificial intelligence

#artificialintelligence

Machine learning and artificial intelligence are set to transform the banking industry, using vast amounts of data to build models that improve decision making, tailor services, and improve risk management. According to the McKinsey Global Institute, this could generate value of more than $250 billion in the banking industry.1 1.For the purposes of this article machine learning is broadly defined to include algorithms that learn from data without being explicitly programmed, including, for example, random forests, boosted decision trees, support-vector machines, deep learning, and reinforcement learning. The definition includes both supervised and unsupervised algorithms. For a full primer on the applications of artificial intelligence, we refer the reader to "An executive's guide to AI." But there is a downside, since machine-learning models amplify some elements of model risk.


Unbiased Measurement of Feature Importance in Tree-Based Methods

arXiv.org Machine Learning

This paper examines split-improvement feature importance scores for tree-based methods. Starting with Classification and Regression Trees (CART; Breiman, 2017) and C4.5 (Quinlan, 2014), decision trees have been a workhorse of general machine learning, particularly within ensemble methods such as Random Forests (RF; Breiman, 2001) and Gradient Boosting Trees (Friedman, 2001). They enjoy the benefits of computational speed, few tuning parameters and natural ways of handling missing values.


Multinomial Random Forests: Fill the Gap between Theoretical Consistency and Empirical Soundness

arXiv.org Machine Learning

Random forests (RF) are one of the most widely used ensemble learning methods in classification and regression tasks. Despite its impressive performance, its theoretical consistency, which would ensure that its result converges to the optimum as the sample size increases, has been left far behind. Several consistent random forest variants have been proposed, yet all with relatively poor performance compared to the original random forests. In this paper, a novel RF framework named multinomial random forests (MRF) is proposed. In the MRF, an impurity-based multinomial distribution is constructed as the basis for the selection of a splitting point. This ensures that a certain degree of randomness is achieved while the overall quality of the trees is not much different from the original random forests. We prove the consistency of the MRF and demonstrate with multiple datasets that it performs similarly as the original random forests and better than existent consistent random forest variants for both classification and regression tasks.


Rectangular Bounding Process

arXiv.org Artificial Intelligence

Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model -- the Rectangular Bounding Process (RBP) -- to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods.


Comprehensive Analysis of Dynamic Message Sign Impact on Driver Behavior: A Random Forest Approach

arXiv.org Machine Learning

This study investigates the potential effects of different Dynamic Message Signs (DMSs) on driver behavior using a full-scale high-fidelity driving simulator. Different DMSs are categorized by their content, structure, and type of messages. A random forest algorithm is used for three separate behavioral analyses; a route diversion analysis, a route choice analysis and a compliance analysis; to identify the potential and relative influences of different DMSs on these aspects of driver behavior. A total of 390 simulation runs are conducted using a sample of 65 participants from diverse socioeconomic backgrounds. Results obtained suggest that DMSs displaying lane closure and delay information with advisory messages are most influential with regards to diversion while color-coded DMSs and DMSs with avoid route advice are the top contributors impacting route choice decisions and DMS compliance. In this first-of-a-kind study, based on the responses to the pre and post simulation surveys as well as results obtained from the analysis of driving-simulation-session data, the authors found that color-blind-friendly, color-coded DMSs are more effective than alphanumeric DMSs - especially in scenarios that demand high compliance from drivers. The increased effectiveness may be attributed to reduced comprehension time and ease with which such DMSs are understood by a greater percentage of road users.


Improving Skin Condition Classification with a Visual Symptom Checker trained using Reinforcement Learning

arXiv.org Artificial Intelligence

We present a visual symptom checker that combines a pre-trained Convolutional Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question Answering (QA) model. This method enables us to not only increase the classification confidence and accuracy of the visual symptom checker, but also decreases the average number of relevant questions asked to narrow down the differential diagnosis. By combining the CNN output in the form of classification probabilities as a part of the state structure of the simulated patient's environment, a DQN-based RL agent learns to ask the best symptom that maximizes its expected return over symptoms. We demonstrate that our RL approach increases the accuracy more than 20% as compared to the CNN alone, and up to 10% as compared to the decision tree model. We finally show that the RL approach not only outperforms the performance of the decision tree approach but also narrows down the diagonosis faster in terms of the average number of asked questions.


Finding the Root - Jason M. Pittman

#artificialintelligence

You may have thought we were done with decisions trees. I am done with respect to discussing general approaches and types of problems. You could say that we're moving from a view of the forest, to finding the root for our tree. However, there is a bit more to explore when it comes to the underlying mathematical functions associated with navigating data to construct our trees. In our last discussion, I introduced the concept of a cost function and gave a specific example in the Gini coefficient.


8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

#artificialintelligence

Has this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy immediately. You dive a little deeper and discover that 90% of the data belongs to one class. This is an example of an imbalanced dataset and the frustrating results it can cause.