AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Score Spark-built machine learning models

#artificialintelligenceApr-21-2016, 01:10:34 GMT

This topic describes how to load machine learning (ML) models that have been built using Spark MLlib and stored in Azure Blob Storage (WASB), and how to score them with datasets that have also been stored in WASB. It shows how to pre-process the input data, transform features using the indexing and encoding functions in the MLlib toolkit, and how to create a labeled point data object that can be used as input for scoring with the ML models. The models used for scoring include Linear Regression, Logistic Regression, Random Forest Models, and Gradient Boosting Tree Models. You need an Azure account and an HDInsight Spark cluster to begin this walkthrough. See the Overview of Data Science using Spark on Azure HDInsight for these requirements, for a description of the NYC 2013 Taxi data used here, and for instructions on how execute code from a Jupyter notebook on the Spark cluster.

artificial intelligence, machine learning, spark cluster, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Add feedback

Random-ized Forest: A new class of Ensemble algorithms

#artificialintelligenceApr-20-2016, 23:25:50 GMT

It's a known fact that bagging (an ensemble technique) works well on unstable algorithms like decision trees, artificial neural networks and not on stable algorithms like Naive Bayes. The well known ensemble algorithm Random forest thrives on the ability of bagging technique which leverages the'instability' of decisions trees, to help build a better classifier. Even though, random forest attempts to handle the issues caused by highly correlated trees, does it completely solve the issue? Can the decision trees be made more unstable than what random forest does, so that the learner be even more accurate? If trees are sufficiently deep, they have very low bias.

artificial intelligence, machine learning, step factor, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Predicting Prices in the Power TAC Wholesale Energy Market

Chowdhury, Moinul Morshed Porag (The University of Texas at El Paso)

AAAI ConferencesApr-19-2016

The Power TAC simulation emphasizes the strategic problems that broker agents face in managing the economics of a smart grid. The brokers must make trades in multiple markets and to be successful, brokers must make many good predictions about future supply, demand,and prices. Clearing price prediction is an important part of the broker’s wholesale market strategy because it helps the broker to make intelligent decisions when purchasing energy at low cost in a day-ahead market. I describe my work on using machine learning methods to predict prices in the Power TAC wholesale market, which will be used in future bidding strategies.

artificial intelligence, machine learning, simulation, (16 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.15)

Industry:

Energy (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)

Add feedback

Sparse Perceptron Decision Tree for Millions of Dimensions

Liu, Weiwei (University of Technology) | Tsang, Ivor W. (University of Technology)

AAAI ConferencesApr-19-2016

Due to the nonlinear but highly interpretable representations,decision tree (DT) models have significantly attracted a lot of attention of researchers. However, DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree(PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of sparse perceptron decision node (SPDN) with a budget constraint on the weight coefficients, and propose a sparse perceptron decision tree (SPDT) algorithm to achieve nonlinear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SPDT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each SPDN. Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small,yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.

artificial intelligence, decision tree, machine learning, (16 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Random Composite Forests

DeSalvo, Giulia (Courant Institute of Mathematical Sciences) | Mohri, Mehryar (Courant Institute of Mathematical Sciences and Google Research)

AAAI ConferencesApr-19-2016

We introduce a broad family of decision trees, Composite Trees, whose leaf classifiers are selected out of a hypothesis set composed of p subfamilies with different complexities. We prove new data-dependent learning guarantees for this family in the multi-class setting. These learning bounds provide a quantitative guidance for the choice of the hypotheses at each leaf. Remarkably, they depend on the Rademacher complexities of the sub-families of the predictors and the fraction of sample points correctly classified at each leaf. We further introduce random composite trees and derive learning guarantees for random composite trees which also apply to Random Forests. Using our theoretical analysis, we devise a new algorithm, RANDOMCOMPOSITEFORESTS (RCF), that is based on forming an ensemble of random composite trees. We report the results of experiments demonstrating that RCF yields significant performance improvements over both Random Forests and a variant of RCF in several tasks.

algorithm, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

A (small) introduction to Boosting

#artificialintelligenceApr-18-2016, 04:10:51 GMT

Boosting is a machine learning meta-algorithm that aims to iteratively build an ensemble of weak learners, in an attempt to generate a strong overall model. For example, consider a problem of binary classification with approximately 50% of samples belonging to each class. Random guessing in this case would yield an accuracy of around 50%. So a weak learner would be any algorithm, however simple, that slightly improves this score – say 51-55% or more. Usually, weak learners are pretty basic in nature.

artificial intelligence, decision tree learning, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.32)

Add feedback

Tuning Parameters for Boosting/Bagging/Random Forest • /r/MachineLearning

@machinelearnbotApr-17-2016, 21:05:11 GMT

Random forests usually performs quite well with the default settings. That is bootstrap resampling scheme, unpruned trees, as many trees as possible to get results in a reasonable amount of time and sqrt(#features) tried per split (mtry parameter). Then you can try to optimize the choices by checking the results on out of bag data (those each tree didnt train on because of the resampling scheme). If you have very unbalanced classes you should decide a measure of interest (such as true positive ratio) and try to tune the related parameter. Out of bag data can be trusted almost as a proper cross validation if you use enough trees and bootstrap resampling.

artificial intelligence, decision tree learning, tuning parameter, (4 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Mondrian forests: Efficient random forests for streaming data via Bayesian nonparametrics

#artificialintelligenceApr-15-2016, 08:41:02 GMT

Ensembles of randomized decision trees are widely used for classification and regression tasks in machine learning and statistics. They achieve competitive predictive performance and are computationally efficient to train (batch setting) and test, making them excellent candidates for real world prediction tasks. However, the most popular variants (such as Breiman's random forest and extremely randomized trees) work only in the batch setting and cannot handle streaming data easily. In this talk, I will present Mondrian Forests, where random decision trees are generated from a Bayesian nonparametric model called a Mondrian process (Roy and Teh, 2009). Making use of the remarkable consistency properties of the Mondrian process, we develop a variant of extremely randomized trees that can be constructed in an incremental fashion efficiently, thus making their use on streaming data simple and efficient.

artificial intelligence, decision tree learning, machine learning, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

What Developers Actually Need to Know About Machine Learning

#artificialintelligenceApr-15-2016, 06:56:23 GMT

Something is wrong in the way ML is being taught to developers. Most ML teachers like to explain how different learning algorithms work and spend tons of time on that. For a beginner who wants to start using ML, being able to choose an algorithm and set parameters looks like the #1 barrier to entry, and knowing how the different techniques work seems to be a key requirement to remove that barrier. Many practitioners argue however that you only need one technique to get started: random forests. Other techniques may sometimes outperform them, but in general, random forests are the most likely to perform best on a variety of problems (see Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?), which makes them more than enough for a developer just getting started with ML.

artificial intelligence, decision tree learning, machine learning, (14 more...)

#artificialintelligence

Genre: Instructional Material (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback

Visualizing a Decision Tree - Machine Learning Recipes #2

#artificialintelligenceApr-15-2016, 05:10:36 GMT

Last episode, we treated our Decision Tree as a blackbox. In this episode, we'll build one on a real dataset, add code to visualize it, and practice reading it - so you can see how it works under the hood. And hey -- I may have gone a little fast through some parts. Just let me know, I'll slow down. Also: we'll do a Q&A episode down the road, so if anything is unclear, just ask! Subscribe to the Google Developers: http://goo.gl/mQyv5L

artificial intelligence, decision tree learning, social media, (2 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)

Add feedback