Decision Tree Learning
Spatially Biased Random Forests
Mitchell, Benjamin (Villanova University) | Sheppard, John (Montana State University)
Recent successes in deep learning have led to explorations of what makes these techniques so powerful. One goal of such studies is to determine whether such properties can be transferred to alternative learning methods and yield similar benefits. Since the generalization power of any learning algorithm depends upon the inductive bias(es) of that algorithm, we hypothesize that utilizing a bias incorporated by CNNs and other deep methods--spatial locality--can benefit other learning methods as well. We test this hypothesis by incorporating spatial structure when constructing random forests. Our experiments demonstrate that incorporating a spatial locality bias improves the performance of random forests on several image classification tasks.
Survival of the Fittest in PlayerUnknown BattleGround
Rokad, Brij, Karumudi, Tushar, Acharya, Omkar, Jagtap, Akshay
The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round. In this paper specifically, we have tried to predict the placement of the player in the ultimate survival test. The data set has been taken from Kaggle. Entire dataset has 29 attributes which are categories to 1 label(winPlacePerc), training set has 4.5 million instances and testing set has 1.9 million. winPlacePerc is continuous category, which makes it harder to predict the survival of the fittest. To overcome this problem, we have applied multiple machine learning models to find the optimum prediction. Model consists of LightGBM Regression (Light Gradient Boosting Machine Regression), MultiLayer Perceptron, M5P (improvement on C4.5) and Random Forest. To measure the error rate, Mean Absolute Error has been used. With the final prediction we have achieved MAE of 0.02047, 0.065, 0.0592 and 0634 respectively.
Experimental Evaluation of Individualized Treatment Rules
Imai, Kosuke, Li, Michael Lingzhi
In recent years, the increasing availability of individual-level data and the advancement of machine learning algorithms have led to the explosion of methodological development for finding optimal individualized treatment rules (ITRs). These new tools are being applied in a variety of fields including business, medicine, and politics. However, there exist few methods that empirically evaluate the efficacy of ITRs. In particular, many of the existing ITR estimators are based on complex models and do not come with statistical uncertainty estimates. We consider common real-world settings, in which policy makers wish to predict the performance of a given ITR prior to its administration in a target population. We propose to use a randomized experiment for evaluating ITRs. Unlike the existing methods, the proposed methodology is based on Neyman's repeated sampling approach and does not require modeling assumptions. As a result, it is applicable to the empirical evaluation of ITRs derived from a wide range of statistical and machine learning models. We conduct a simulation study to demonstrate the accuracy of the proposed methodology in small samples. We also apply our methods to the Project STAR (Student-Teacher Achievement Ratio) experiment to compare the performance of ITRs that are based on popular machine learning methods used for estimating heterogeneous treatment effects.
Resource-aware Elastic Swap Random Forest for Evolving Data Streams
Marrón, Diego, Ayguadé, Eduard, Herrero, José Ramon, Bifet, Albert
Continual learning based on data stream mining deals with ubiquitous sources of Big Data arriving at high-velocity and in real-time. Adaptive Random Forest ({\em ARF}) is a popular ensemble method used for continual learning due to its simplicity in combining adaptive leveraging bagging with fast random Hoeffding trees. While the default ARF size provides competitive accuracy, it is usually over-provisioned resulting in the use of additional classifiers that only contribute to increasing CPU and memory consumption with marginal impact in the overall accuracy. This paper presents Elastic Swap Random Forest ({\em ESRF}), a method for reducing the number of trees in the ARF ensemble while providing similar accuracy. {\em ESRF} extends {\em ARF} with two orthogonal components: 1) a swap component that splits learners into two sets based on their accuracy (only classifiers with the highest accuracy are used to make predictions); and 2) an elastic component for dynamically increasing or decreasing the number of classifiers in the ensemble. The experimental evaluation of {\em ESRF} and comparison with the original {\em ARF} shows how the two new components contribute to reducing the number of classifiers up to one third while providing almost the same accuracy, resulting in speed-ups in terms of per-sample execution time close to 3x.
Explainable AI for Trees: From Local Explanations to Global Understanding
Lundberg, Scott M., Erion, Gabriel, Chen, Hugh, DeGrave, Alex, Prutkin, Jordan M., Nair, Bala, Katz, Ronit, Himmelfarb, Jonathan, Bansal, Nisha, Lee, Su-In
Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
Hybrid Predictive Model: When an Interpretable Model Collaborates with a Black-box Model
Interpretable machine learning has become a strong competitor for traditional black-box models. However, the possible loss of the predictive performance for gaining interpretability is often inevitable, putting practitioners in a dilemma of choosing between high accuracy (black-box models) and interpretability (interpretable models). In this work, we propose a novel framework for building a Hybrid Predictive Model (HPM) that integrates an interpretable model with any black-box model to combine their strengths. The interpretable model substitutes the black-box model on a subset of data where the black-box is overkill or nearly overkill, gaining transparency at no or low cost of the predictive accuracy. We design a principled objective function that considers predictive accuracy, model interpretability, and model transparency (defined as the percentage of data processed by the interpretable substitute.) Under this framework, we propose two hybrid models, one substituting with association rules and the other with linear models, and we design customized training algorithms for both models. We test the hybrid models on structured data and text data where interpretable models collaborate with various state-of-the-art black-box models. Results show that hybrid models obtain an efficient trade-off between transparency and predictive performance, characterized by our proposed efficient frontiers.
Formal Verification of Input-Output Mappings of Tree Ensembles
Törnblom, John, Nadjm-Tehrani, Simin
Recent advances in machine learning and artificial intelligence are now being considered in safety-critical autonomous systems where software defects may cause severe harm to humans and the environment. Design organizations in these domains are currently unable to provide convincing arguments that their systems are safe to operate when machine learning algorithms are used to implement their software. In this paper, we present an efficient method to extract equivalence classes from decision trees and tree ensembles, and to formally verify that their input-output mappings comply with requirements. The idea is that, given that safety requirements can be traced to desirable properties on system input-output patterns, we can use positive verification outcomes in safety arguments. This paper presents the implementation of the method in the tool VoTE (Verifier of Tree Ensembles), and evaluates its scalability on two case studies presented in current literature. We demonstrate that our method is practical for tree ensembles trained on low-dimensional data with up to 25 decision trees and tree depths of up to 20. Our work also studies the limitations of the method with high-dimensional data and preliminarily investigates the trade-off between large number of trees and time taken for verification.
Two-stage Best-scored Random Forest for Large-scale Regression
Hang, Hanyuan, Chen, Yingyi, Suykens, Johan A. K.
We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting procedure into two: In stage one, the feature space is partitioned into non-overlapping cells; in stage two, child trees grow separately on these cells. The strengths of this algorithm can be summarized as follows: First of all, the pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible, which resolves the boundary discontinuities long plaguing the existing algorithms. Secondly, the two-stage procedure paves the way for parallel computing, leading to computational efficiency. Last but not least, TBRF can serve as an inclusive framework where different mainstream regression strategies such as linear predictor and least squares support vector machines (LS-SVMs) can also be incorporated as value assignment approaches on leaves of the child trees, depending on the characteristics of the underlying data sets. Numerical assessments on comparisons with other state-of-the-art methods on several large-scale real data sets validate the promising prediction accuracy and high computational efficiency of our algorithm.
Best-scored Random Forest Density Estimation
This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. The terminology best-scored stands for selecting one density tree with the best estimation performance out of a certain number of purely random density tree candidates and we then name the selected one the best-scored random density tree. In this manner, the ensemble of these selected trees that is the best-scored random density forest can achieve even better estimation results than simply integrating trees without selection. From the theoretical perspective, by decomposing the error term into two, we are able to carry out the following analysis: First of all, we establish the consistency of the best-scored random density trees under $L_1$-norm. Secondly, we provide the convergence rates of them under $L_1$-norm concerning with three different tail assumptions, respectively. Thirdly, the convergence rates under $L_{\infty}$-norm is presented. Last but not least, we also achieve the above convergence rates analysis for the best-scored random density forest. When conducting comparative experiments with other state-of-the-art density estimation approaches on both synthetic and real data sets, it turns out that our algorithm has not only significant advantages in terms of estimation accuracy over other methods, but also stronger resistance to the curse of dimensionality.
Development of a Forecasting and Warning System on the Ecological Life-Cycle of Sunn Pest
Balaban, İsmail, Acun, Fatih, Arpalı, Onur Yiğit, Murat, Furkan, Babaroğlu, Numan Ertuğrul, Akci, Emre, Çulcu, Mehmet, Özkan, Mümtaz, Temizer, Selim
We provide a machine learning solution that replaces the traditional methods for deciding the pesticide application time of Sunn Pest. We correlate climate data with phases of Sunn Pest in its life-cycle and decide whether the fields should be sprayed. Our solution includes two groups of prediction models. The first group contains decision trees that predict migration time of Sunn Pest from winter quarters to wheat fields. The second group contains random forest models that predict the nymphal stage percentages of Sunn Pest which is a criterion for pesticide application. We trained our models on four years of climate data which was collected from Kir\c{s}ehir and Aksaray. The experiments show that our promised solution make correct predictions with high accuracies.