Goto

Collaborating Authors

 Decision Tree Learning


Random Forests: some methodological insights

arXiv.org Machine Learning

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.


Anytime Induction of Low-cost, Low-error Classifiers: a Sampling-based Approach

Journal of Artificial Intelligence Research

Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex real-world applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime cost-sensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the state-of-the-art cost-sensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns.


A Too-Clever Ranking Method

AI Magazine

I developed what I scored, and those with the lowest scores could be removed before running C4.5 to build a decision tree with the remainder. I ran an experiment in which I removed the bottom 10 percent of the instances in a University of California, Irvine (UCI) data set. The resulting tree was smaller and more accurate (as measured by 10-fold CV) than the tree built on the full data set. Then I removed the bottom 20 percent of the instances and got a tree that was smaller than the last one and just as accurate. At that point I had the feeling that this was going to make a great paper for the International Conference on Machine Learning (ICML).


Spectrum of Variable-Random Trees

Journal of Artificial Intelligence Research

In this paper, we show that a continuous spectrum of randomisation exists, in which most existing tree randomisations are only operating around the two ends of the spectrum. That leaves a huge part of the spectrum largely unexplored. We propose a base learner VR-Tree which generates trees with variable-randomness. VR-Trees are able to span from the conventional deterministic trees to the complete-random trees using a probabilistic parameter. Using VR-Trees as the base models, we explore the entire spectrum of randomised ensembles, together with Bagging and Random Subspace. We discover that the two halves of the spectrum have their distinct characteristics; and the understanding of which allows us to propose a new approach in building better decision tree ensembles. We name this approach Coalescence, which coalesces a number of points in the random-half of the spectrum. Coalescence acts as a committee of ``experts'' to cater for unforeseeable conditions presented in training data. Coalescence is found to perform better than any single operating point in the spectrum, without the need to tune to a specific level of randomness. In our empirical study, Coalescence ranks top among the benchmarking ensemble methods including Random Forests, Random Subspace and C5 Boosting; and only Coalescence is significantly better than Bagging and Max-Diverse Ensemble among all the methods in the comparison. Although Coalescence is not significantly better than Random Forests, we have identified conditions under which one will perform better than the other.


Fast Discriminative Visual Codebooks using Randomized Clustering Forests

Neural Information Processing Systems

Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests - ensembles of randomly created clustering trees - and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks.


Fast Discriminative Visual Codebooks using Randomized Clustering Forests

Neural Information Processing Systems

Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests - ensembles of randomly created clustering trees - and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks.


Fast Discriminative Visual Codebooks using Randomized Clustering Forests

Neural Information Processing Systems

Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests - ensembles of randomly created clustering trees - and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks.


Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text

Journal of Artificial Intelligence Research

It is well known that utterances convey a great deal of information about the speaker in addition to their semantic content. One such type of information consists of cues to the speaker's personality traits, the most fundamental dimension of variation between humans. Recent work explores the automatic detection of other types of pragmatic variation in text and conversation, such as emotion, deception, speaker charisma, dominance, point of view, subjectivity, opinion and sentiment. Personality affects these other aspects of linguistic production, and thus personality recognition may be useful for these tasks, in addition to many other potential applications. However, to date, there is little work on the automatic recognition of personality traits. This article reports experimental results for recognition of all Big Five personality traits, in both conversation and text, utilising both self and observer ratings of personality. While other work reports classification results, we experiment with classification, regression and ranking models. For each model, we analyse the effect of different feature sets on accuracy. Results show that for some traits, any type of statistical model performs significantly better than the baseline, but ranking models perform best overall. We also present an experiment suggesting that ranking models are more accurate than multi-class classifiers for modelling personality. In addition, recognition models trained on observed personality perform better than models trained using self-reports, and the optimal feature set depends on the personality trait. A qualitative analysis of the learned models confirms previous findings linking language and personality, while revealing many new linguistic markers.


A Comparison of Different Machine Transliteration Models

Journal of Artificial Intelligence Research

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance.


Using Random Forests in the Structured Language Model

Neural Information Processing Systems

In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.