AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Reliable ABC model choice via random forests

Pudlo, Pierre, Marin, Jean-Michel, Estoup, Arnaud, Cornuet, Jean-Marie, Gautier, Mathieu, Robert, Christian P.

arXiv.org Machine LearningSep-2-2015

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1406.6288

Country:

North America > United States (0.28)
Europe > France (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Wright, Marvin N., Ziegler, Andreas

arXiv.org Machine LearningAug-18-2015

We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

artificial intelligence, machine learning, ranger, (18 more...)

arXiv.org Machine Learning

1508.04409

Country:

North America > United States (0.28)
Europe > Germany (0.28)
Europe > Austria (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)

Add feedback

Causal Decision Trees

Li, Jiuyong, Ma, Saisai, Le, Thuc Duy, Liu, Lin, Liu, Jixue

arXiv.org Artificial IntelligenceAug-16-2015

Uncovering causal relationships in data is a major objective of data analytics. Causal relationships are normally discovered with designed experiments, e.g. randomised controlled trials, which, however are expensive or infeasible to be conducted in many cases. Causal relationships can also be found using some well designed observational studies, but they require domain experts' knowledge and the process is normally time consuming. Hence there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree where nodes have causal interpretations. Our method follows a well established causal inference framework and makes use of a classic statistical test. The method is practical for finding causal signals in large data sets.

artificial intelligence, causal relationship, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2016.2619350

1508.03812

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > South Australia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.75)

Add feedback

Crime Prediction Based On Crime Types And Using Spatial And Temporal Criminal Hotspots

Almanie, Tahani, Mirza, Rsha, Lor, Elizabeth

arXiv.org Artificial IntelligenceAug-9-2015

This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naive Bayesian classifier in order to predict potential crime types. To further analyse crimes datasets, the paper introduces an analysis study by combining our findings of Denver crimes dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within a particular time.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/ijdkp.2015.5401

1508.0205

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.75)
North America > United States > Colorado > Denver County > Denver (0.34)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.35)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
(2 more...)

Add feedback

Consistency of random forests

Scornet, Erwan, Biau, Gérard, Vert, Jean-Philippe

arXiv.org Machine LearningAug-8-2015

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3D

artificial intelligence, machine learning, random forest, (17 more...)

arXiv.org Machine Learning

doi: 10.1214/15-AOS1321

1405.2881

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Predicting Occupancy Trends in Barcelona's Bicycle Service Stations Using Open Data

Dias, Gabriel Martins, Bellalta, Boris, Oechsner, Simon

arXiv.org Artificial IntelligenceAug-6-2015

In 2008, the CEO of the company that manages and maintains the public bicycle service in Barcelona recognized that one may not expect to always find a place to leave the rented bike nearby their destination, similarly to the case when, driving a car, people may not find a parking lot. In this work, we make predictions about the statuses of the stations of the public bicycle service in Barcelona. We show that it is feasible to correctly predict nearly half of the times when the stations are either completely full of bikes or completely empty, up to 2 days before they actually happen. That is, users might avoid stations at times when they could not return a bicycle that they have rented before, or when they would not find a bike to rent. To achieve that, we apply the Random Forest algorithm to classify the status of the stations and improve the lifetime of the models using publicly available data, such as information about the weather forecast. Finally, we expect that the results of the predictions can be used to improve the quality of the service and make it more reliable for the users.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Artificial Intelligence

1505.03662

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.07)
Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)

Add feedback

Banzhaf Random Forests

Sun, Jianyuan, Zhong, Guoqiang, Dong, Junyu, Cai, Yajuan

arXiv.org Machine LearningJul-22-2015

Random forests are a type of ensemble method which makes predictions by combining the results of several independent trees. However, the theory of random forests has long been outpaced by their application. In this paper, we propose a novel random forests algorithm based on cooperative game theory. Banzhaf power index is employed to evaluate the power of each feature by traversing possible feature coalitions. Unlike the previously used information gain rate of information theory, which simply chooses the most informative feature, the Banzhaf power index can be considered as a metric of the importance of each feature on the dependency among a group of features. More importantly, we have proved the consistency of the proposed algorithm, named Banzhaf random forests (BRF). This theoretical analysis takes a step towards narrowing the gap between the theory and practice of random forests for classification problems. Experiments on several UCI benchmark data sets show that BRF is competitive with state-of-the-art classifiers and dramatically outperforms previous consistent random forests. Particularly, it is much more efficient than previous consistent random forests.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Machine Learning

1507.06105

Country: Asia > China (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Leisure & Entertainment > Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Training-Time Optimization of a Budgeted Booster

Huang, Yi (University of Illinios at Chicago) | Powers, Brian (University of Illinios at Chicago) | Reyzin, Lev (University of Illinios at Chicago)

AAAI ConferencesJul-15-2015

We consider the problem of feature-efficient prediction - a setting where features have costs and the learner is limited by a budget constraint on the total cost of the features it can examine in test time. We focus on solving this problem with boosting by optimizing the choice of base learners in the training phase and stopping the boosting process when the learner's budget runs out. We experimentally show that our method improves upon the boosting approach AdaBoostRS [Reyzin, 2011] and in many cases also outperforms the recent algorithm SpeedBoost [Grubb and Bagnell, 2012]. We provide a theoretical justication for our optimization method via the margin bound. We also experimentally show that our method outperforms pruned decision trees, a natural budgeted classifier.

adaboost, budget, learner, (15 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Positive, Negative, or Neutral: Learning an Expanded Opinion Lexicon from Emoticon-Annotated Tweets

Bravo-Marquez, Felipe (The University of Waikato) | Frank, Eibe (The University of Waikato) | Pfahringer, Bernhard (The University of Waikato)

AAAI ConferencesJul-15-2015

We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains part-of-speech (POS) disambiguated entries with a three-dimensional probability distribution for positive, negative, and neutral polarities. To obtain this distribution using machine learning, we propose word-level attributes based on POS tags and information calculated from streams of emoticon-annotated tweets. Our experimental results show that our method outperforms the three-dimensional word-level polarity classification performance obtained by semantic orientation, a state-of-the-art measure for establishing world-level sentiment.

lexicon, lexicon expansion, tweet, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Oceania > New Zealand > North Island > Waikato (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.91)
(3 more...)

Add feedback

Understanding Random Forests: From Theory to Practice

Louppe, Gilles

arXiv.org Machine LearningJun-3-2015

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and providing insights about the problem. Yet, caution should avoid using machine learning as a black-box tool, but rather consider it as a methodology, with a rational thought process that is entirely dependent on the problem under study. In particular, the use of algorithms should ideally require a reasonable understanding of their mechanisms, properties and limitations, in order to better apprehend and interpret their results. Accordingly, the goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability. The first part of this work studies the induction of decision trees and the construction of ensembles of randomized trees, motivating their design and purpose whenever possible. Our contributions follow with an original complexity analysis of random forests, showing their good computational performance and scalability, along with an in-depth discussion of their implementation details, as contributed within Scikit-Learn. In the second part of this work, we analyse and discuss the interpretability of random forests in the eyes of variable importance measures. The core of our contributions rests in the theoretical characterization of the Mean Decrease of Impurity variable importance measure, from which we prove and derive some of its properties in the case of multiway totally randomized trees and in asymptotic conditions. In consequence of this work, our analysis demonstrates that variable importances [...].

artificial intelligence, machine learning, survey article, (21 more...)

arXiv.org Machine Learning

1407.7502

Country:

North America > United States > California (0.45)
North America > United States > Michigan (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Summary/Review (0.92)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback