AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Want to Win at Kaggle? Pay Attention to Your Ensembles.

#artificialintelligenceMay-28-2016, 23:30:45 GMT

Summary: Want to win a Kaggle competition or at least get a respectable place on the leaderboard? These days it's all about ensembles and for a lot of practitioners that means reaching for random forests. Random forests have indeed been very successful but it's worth remembering that there are three different categories of ensembles and some important hyper parameters tuning issues within each Here's a brief review. The Kaggle competitions are like formula racing for data science. Winners edge out competitors at the fourth decimal place and like Formula 1 race cars, not many of us would mistake them for daily drivers.

artificial intelligence, classifier, machine learning, (19 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Sports > Motorsports (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)

Add feedback

The difference between machine learning and statistics in data mining

@machinelearnbotMay-26-2016, 22:00:18 GMT

Cynics, looking wryly at the explosion of commercial interest (and hype) in this area, equate data mining to statistics plus marketing. In truth, you should not look for a dividing line between machine learning and statistics because there is a continuum -- and a multidimensional one at that -- of data analysis techniques. Some derive from the skills taught in standard statistics courses, and others are more closely associated with the kind of machine learning that has arisen out of computer science. Historically, the two sides have had rather different traditions. If forced to point to a single difference of emphasis, it might be that statistics has been more concerned with testing hypotheses, whereas machine learning has been more concerned with formulating the process of generalization as a search through possible hypotheses.

artificial intelligence, decision tree learning, machine learning and statistics, (4 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)

Add feedback

Evasion and Hardening of Tree Ensemble Classifiers

Kantchelian, Alex, Tygar, J. D., Joseph, Anthony D.

arXiv.org Machine LearningMay-26-2016

Classifier evasion consists in finding for a given instance $x$ the nearest instance $x'$ such that the classifier predictions of $x$ and $x'$ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.

artificial intelligence, constraint, machine learning, (16 more...)

arXiv.org Machine Learning

1509.07892

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Highly Accurate Prediction of Jobs Runtime Classes

Reiner-Benaim, Anat, Grabarnick, Anna, Shmueli, Edi

arXiv.org Machine LearningMay-23-2016

Supplying job schedulers with information on how long the jobs are expected to run enabled the development of the backfilling algorithms, which leverage this information to pack the jobs more efficiently and improve system utilization [1]. These algorithms, however, were designed for parallel systems, in which the jobs require many processors in order to execute, and processor fragmentation (idleness) is a big concern. In those environments the scheduler needs to know the actual runtimes of the jobs (use numeric predictions) to be able to optimize the schedule and improve performance [10]. Our work targets systems in which most jobs are serial, like server farms that are used for software testing. In those environments sophisticated scheduling algorithms are not required, and in order to improve performance it is enough to simply separate the short jobs from the long and assign them to different queues in the system [12]. This separation reduces the likelihood that short jobs will be delayed after long ones, improves the average turnaround times of the jobs and overall system throughput.

artificial intelligence, machine learning, runtime, (18 more...)

arXiv.org Machine Learning

1605.00388

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Architecture > Distributed Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

OpenCV or OpenDT for decision trees? • /r/MachineLearning

@machinelearnbotMay-21-2016, 20:20:23 GMT

I have been working on a c project using decision trees for some time. We have been using OpenCV for the DT part, but since the code of 3.1 seems to be riddled with issues and is partially inaccessible, I am wondering if an alternative, especially OpenDT (or maybe waffles) would be worthwile.

artificial intelligence, decision tree learning, machine learning, (3 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)

Add feedback

Randomized Forest :Thought vectors to build a new class of Ensemble algorithms

#artificialintelligenceMay-21-2016, 04:42:02 GMT

It's a known fact that bagging (an ensemble technique) works well on unstable algorithms like decision trees, artificial neural networks and not on stable algorithms like Naive Bayes. The well known ensemble algorithm Random forest thrives on the ability of bagging technique which leverages the'instability' of decisions trees, to help build a better classifier. Even though, random forest attempts to handle the issues caused by highly correlated trees, does it completely solve the issue? Can the decision trees be made more unstable than what random forest does, so that the learner be even more accurate? If trees are sufficiently deep, they have very low bias.

artificial intelligence, machine learning, step factor, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Can a classification tree model "know" to predict only one of every class for many subsets in a data?

@machinelearnbotMay-20-2016, 16:25:18 GMT

In order to help recipients understand my question, there will be context added. I don't know a whole lot of semantics so please bare with me. Draper is hosting a competition on Kaggle to classify images by day. The chosen metrics were overall-brightness and the number of similarities between images taken on different days. In addition, the "compared" variable was used as a categorical variable.

artificial intelligence, classification tree model, machine learning, (5 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.53)

Add feedback

ABC random forests for Bayesian parameter inference

#artificialintelligenceMay-20-2016, 07:45:45 GMT

Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics?(y); To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach. An intriguing point we uncovered through Louis' experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest.

artificial intelligence, machine learning, random forest, (19 more...)

#artificialintelligence

Country:

Europe > Finland > Uusimaa > Helsinki (0.51)
Europe > France (0.25)
Atlantic Ocean > North Atlantic Ocean > Baltic Sea (0.05)

Industry: Transportation > Air (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Visualizing a Decision Tree - Machine Learning Recipes #2

#artificialintelligenceMay-16-2016, 16:11:43 GMT

Last episode, we treated our Decision Tree as a blackbox. In this episode, we'll build one on a real dataset, add code to visualize it, and practice reading it - so you can see how it works under the hood. And hey -- I may have gone a little fast through some parts. Just let me know, I'll slow down. Also: we'll do a Q&A episode down the road, so if anything is unclear, just ask! Follow https://twitter.com/random_forests

artificial intelligence, decision tree learning, machine learning recipe, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.70)

Add feedback

ONLamp.com: Building Decision Trees in Python

#artificialintelligenceMay-13-2016, 06:50:07 GMT

The decision tree in the figure is just one of many decision tree structures you could create to solve the marketing problem. The task of finding the optimal decision tree is an intractable problem. For those of you who have taken an analysis of algorithms course, you no doubt recognize this term. For those of you who haven't had this pleasure (he says, gritting his teeth), essentially what this means is that as the amount of test data used to train the decision tree grows, the amount of time it takes to do so grows as well--exponentially. While it may be nearly impossible to find the smallest (or more fittingly, the shallowest) decision tree in a respectable amount of time, it is possible to find a decision tree that is "small enough" using special heuristics.

artificial intelligence, decision tree learning, machine learning, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback