Goto

Collaborating Authors

 Decision Tree Learning


13 Decision Trees and Multi-Valued Attributes J. R. Quinlan

AI Classics

The traditional approach involving protracted interaction between a knowledge engineer and a domain expert is viable only to the extent that both these resources are available; this approach will not meet the apparently exponential growth in demand for expert systems. A solution to this dilemma requires rethinking the way knowledge-based products are built. An example of this reappraisal of methodology appears in Michie (1983), and is based on the principle of formalizing and refining the knowledge implicit in collections of examples or data bases. Dietterich and Michalski (1983) give an overview of methods for learning from examples. There are many such, all based on the idea of inductive generalization. One of the simplest of these methods dates back to work by Hunt in the late fifties (Hunt et al., 1966). Each given example, described by measuring certain fixed properties, belongs to a known class and the'learning' takes the form of developing a classification rule that can then be applied to new objects. Simple though it may be, derivatives of this method have achieved useful results; Kononenko et al. (1984), for example, have managed to generate five medical diagnosis systems with minimal reference to diagnosticians.


12 Generating Expert Rules from Examples in PROLOG B. Arbab* D. Michie

AI Classics

It is assumed that Si are sorted in increasing order of s(Si). Non-linearities of four trees are shown in Figure 6. Ti is absolutely linear; thus its non-linearity measure is zero. T2 is very close to being a balanced tree: non-linearity one. T3 is preferred to T4, i.e. this function is sensitive to the location of non-linearity within a tree (the lower a non-linearity occurs in a tree the lower (better) its measure).


Local Decorrelation For Improved Pedestrian Detection

Neural Information Processing Systems

Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.


Mondrian Forests: Efficient Online Random Forests

Neural Information Processing Systems

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically re-trained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.


Multi-Class Deep Boosting

Neural Information Processing Systems

We present new ensemble learning algorithms for multi-class classification. Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multi-class classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble’s mixture weights. We introduce and discuss several new multi-class ensemble algorithms benefiting from these guarantees, prove positive results for the H-consistency of several of them, and report the results of experiments showing that their performance compares favorably with that of multi-class versions of AdaBoost and Logistic Regression and their L1-regularized counterparts.


Distributed Decision Trees

arXiv.org Machine Learning

Recently proposed budding tree is a decision tree algorithm in which every node is part internal node and part leaf. This allows representing every decision tree in a continuous parameter space, and therefore a budding tree can be jointly trained with backpropagation, like a neural network. Even though this continuity allows it to be used in hierarchical representation learning, the learned representations are local: Activation makes a soft selection among all root-to-leaf paths in a tree. In this work we extend the budding tree and propose the distributed tree where the children use different and independent splits and hence multiple paths in a tree can be traversed at the same time. This ability to combine multiple paths gives the power of a distributed representation, as in a traditional perceptron layer. We show that distributed trees perform comparably or better than budding and traditional hard trees on classification and regression tasks.


bartMachine: Machine Learning with Bayesian Additive Regression Trees

arXiv.org Machine Learning

Ensemble-of-trees methods have become popular choices for forecasting in both regression and classification problems. Algorithms such as random forests (Breiman 2001) and stochastic gradient boosting (Friedman 2002) are two well-established and widely employed procedures. Recent advances in ensemble methods include dynamic trees (Taddy, Gramacy, and Polson 2011) and Bayesian additive regression trees (BART, Chipman, George, and McCulloch 2010), which depart from predecessors in that they rely on an underlying Bayesian probability model rather than a pure algorithm. BART has demonstrated substantial promise in a wide variety of simulations and real world applications such as predicting avalanches on mountain roads (Blattenberger and Fowles 2014), predicting how transcription factors interact with DNA (Zhou and Liu 2008) and predicting movie box office revenues (Eliashberg 2010). This paper introduces bartMachine, a new R (R Core Team 2014) package available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package


rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning

arXiv.org Artificial Intelligence

Random ferns is a machine learning algorithm proposed by [11] for matching same elements between two images of the same scene, allowing one to recognise certain objects or trace them on videos. The original motivation behind this method was to create a simple and efficient algorithm by extending the NaĂŻve Bayes classifier; still the authors acknowledged its strong connection to the decision tree ensembles like the Random forest [2] algorithm. Since introduction, Random ferns have been applied in numerous computer vision application, like image recognition [1], action recognition [10] or augmented reality [14]. However, it has not gathered attention outside this field; thus, this work aims to bring this algorithm to a much wider spectrum of applications. In order to do that, I propose a generalised version of the algorithm, implemented as an R [13] package rFerns. The paper is organised as follows. Section 2 briefly recalls the Bayesian derivation of the original version of Random ferns, presents the decision tree ensemble interpretation of the algorithm and lists modifications leading to the rFerns variant.


Use of Patient Generated Data from Social Media and Collaborative Filtering for Preferences Elicitation in Shared Decision Making

AAAI Conferences

With the increasing demand for personalization in clinical decision support system, one of the most challenging tasks is effective patient preferences elicitation. In the context of the MobiGuide project, within a medical application related to atrial fibrillation, a decision support system has been developed for both doctors and patients. In particular, we support shared decision-making, by integrating decision tree models with a dedicated tool for utility coefficients elicitation. In this paper we focus on the decision problem regarding the choice of anticoagulant therapy for low risk non-valvular atrial fibrillation patients. In addition to the traditional methods, such as time trade-off and standard gamble, an alternative way for preferences elicitation is proposed, exploiting patients’ self-reported data in health-related social media as the main source of information.


A random forest system combination approach for error detection in digital dictionaries

arXiv.org Machine Learning

When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based methods. We investigate combining methods and show that using random forests is a promising approach. We find that in isolation, unsupervised methods rival the performance of supervised methods. Random forests typically require training data so we investigate how we can apply random forests to combine individual base methods that are themselves unsupervised without requiring large amounts of training data. Experiments reveal empirically that a relatively small amount of data is sufficient and can potentially be further reduced through specific selection criteria.