Decision Tree Learning
Interactive Graphics for Visually Diagnosing Forest Classifiers in R
da Silva, Natalia, Cook, Dianne, Lee, Eun-Kyung
This paper describes structuring data and constructing plots to explore forest classification models interactively. A forest classifier is an example of an ensemble, produced by bagging multiple trees. The process of bagging and combining results from multiple trees, produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into class structure in high dimensions. Various aspects are explored in this paper, to assess model complexity, individual model contributions, variable importance and dimension reduction, and uncertainty in prediction associated with individual observations. The ideas are applied to the random forest algorithm, and to the projection pursuit forest, but could be more broadly applied to other bagged ensembles. Interactive graphics are built in R, using the ggplot2, plotly, and shiny packages.
Scalable Bayesian Rule Lists
Yang, Hongyu, Rudin, Cynthia, Seltzer, Margo
We present an algorithm for building probabilistic rule lists that is two orders of magnitude faster than previous work. Rule list algorithms are competitors for decision tree algorithms. They are associative classifiers, in that they are built from pre-mined association rules. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree. Instead of using greedy splitting and pruning like decision tree algorithms, we fully optimize over rule lists, striking a practical balance between accuracy, interpretability, and computational speed. The algorithm presented here uses a mixture of theoretical bounds (tight enough to have practical implications as a screening or bounding procedure), computational reuse, and highly tuned language libraries to achieve computational efficiency. Currently, for many practical problems, this method achieves better accuracy and sparsity than decision trees; further, in many cases, the computational time is practical and often less than that of decision trees. The result is a probabilistic classifier (which estimates P(y = 1|x) for each x) that optimizes the posterior of a Bayesian hierarchical model over rule lists.
On the Reliable Detection of Concept Drift from Streaming Unlabeled Data
Sethi, Tegjyot Singh, Kantardzic, Mehmed
Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.
Random Forests for Big Data
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, Villa-Vialaneix, Nathalie
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations.
An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service
Saadi, Ismaรฏl, Wong, Melvin, Farooq, Bilal, Teller, Jacques, Cools, Mario
We propose the spatiotemporal estimation of the demand that is a function of variable effects related to traffic, pricing and weather conditions. With respect to the methodology, a single decision tree, bootstrap-aggregated (bagged) decision trees, random forest, boosted decision trees, and artificial neural network for regression have been adapted and systematically compared using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and slope. To better assess the quality of the models, they have been tested on a real case study using the data of DiDi Chuxing, the main on-demand ride-hailing service provider in China. In the current study, 199,584 time-slots describing the spatiotemporal ride-hailing demand has been extracted with an aggregated-time interval of 10 mins. All the methods are trained and validated on the basis of two independent samples from this dataset. The results revealed that boosted decision trees provide the best prediction accuracy (RMSE 16.41), while avoiding the risk of over-fitting, followed by artificial neural network (20.09), random forest (23.50), bagged decision trees (24.29) and single decision tree (33.55).
Structural Data Recognition with Graph Model Boosting
Miyazaki, Tomo, Omachi, Shinichiro
This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of these models as well as their ability to capture structural variation. The proposed method constructs a large number of graph models and trains decision trees using the models. This paper makes two main contributions. The first is a novel graph model that can quickly perform calculations, which allows us to construct several models in a feasible amount of time. The second contribution is a novel approach to structural data recognition: graph model boosting. Comprehensive structural variations can be captured with a large number of graph models constructed in a boosting framework, and a sophisticated classifier can be formed by aggregating the decision trees. Consequently, we can carry out structural data recognition with powerful recognition capability in the face of comprehensive structural variation. The experiments shows that the proposed method achieves impressive results and outperforms existing methods on datasets of IAM graph database repository.
Neural Decision Trees
In this paper we propose a synergistic melting of neural networks and decision trees (DT) we call neural decision trees (NDT). NDT is an architecture a la decision tree where each splitting node is an independent multilayer perceptron allowing oblique decision functions or arbritrary nonlinear decision function if more than one layer is used. This way, each MLP can be seen as a node of the tree. We then show that with the weight sharing asumption among those units, we end up with a Hashing Neural Network (HNN) which is a multilayer perceptron with sigmoid activation function for the last layer as opposed to the standard softmax. The output units then jointly represent the probability to be in a particular region. The proposed framework allows for global optimization as opposed to greedy in DT and differentiability w.r.t. all parameters and the input, allowing easy integration in any learnable pipeline, for example after CNNs for computer vision tasks. We also demonstrate the modeling power of HNN allowing to learn union of disjoint regions for final clustering or classification making it more general and powerful than standard softmax MLP requiring linear separability thus reducing the need on the inner layer to perform complex data transformations. We finally show experiments for supervised, semi-suppervised and unsupervised tasks and compare results with standard DTs and MLPs.
Indoor Localization by Fusing a Group of Fingerprints Based on Random Forests
Guo, Xiansheng, Ansari, Nirwan, Li, Huiyong
Indoor localization based on SIngle Of Fingerprint (SIOF) is rather susceptible to the changing environment, multipath, and non-line-of-sight (NLOS) propagation. Building SIOF is also a very time-consuming process. Recently, we first proposed a GrOup Of Fingerprints (GOOF) to improve the localization accuracy and reduce the burden of building fingerprints. However, the main drawback is the timeliness. In this paper, we propose a novel localization framework by Fusing A Group Of fingerprinTs (FAGOT) based on random forests. In the offline phase, we first build a GOOF from different transformations of the received signals of multiple antennas. Then, we design multiple GOOF strong classifiers based on Random Forests (GOOF-RF) by training each fingerprint in the GOOF. In the online phase, we input the corresponding transformations of the real measurements into these strong classifiers to obtain multiple independent decisions. Finally, we propose a Sliding Window aIded Mode-based (SWIM) fusion algorithm to balance the localization accuracy and time. Our proposed approaches can work better in an unknown indoor scenario. The burden of building fingerprints can also be reduced drastically. We demonstrate the performance of our algorithms through simulations and real experimental data using two Universal Software Radio Peripheral (USRP) platforms.
Day05: Recognition with random forest
Day05 brings another Kaggle competition, "Digit Recognizer". The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is. The Jupyter Notebook for this little project is found here. Today, I think I'll use a random forest classifier. A random forest classifier is like an election where the outcome with the most votes (from the ensemble of decision trees) is the predicted classification.