AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Fundamentals of Decision Trees in Machine Learning

@machinelearnbotMay-13-2018, 18:47:56 GMT

A tree has many analogies in real life, and turns out that it has influenced a wide area of machine learning, covering both classification and regression. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. If you're working towards an understanding of machine learning, it's important to know how to work with decision trees. This course covers the essentials of machine learning, including predictive analytics and working with decision trees. In this course, we'll explore several popular tree algorithms and learn how to use reverse engineering to identify specific variables.

artificial intelligence, decision tree learning, machine learning, (1 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Simple and Effective Model-Based Variable Importance Measure

Greenwell, Brandon M., Boehmke, Bradley C., McCarthy, Andrew J.

arXiv.org Machine LearningMay-12-2018

In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.

algorithm, interaction effect, predictor, (14 more...)

arXiv.org Machine Learning

1805.04755

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
(2 more...)

Add feedback

Top 10 Machine Learning Algorithms for Beginners

#artificialintelligenceMay-10-2018, 17:51:05 GMT

The study of ML algorithms has gained immense traction post the Harvard Business Review article terming a'Data Scientist' as the'Sexiest job of the 21st century'. So, for those starting out in the field of ML, we decided to do a reboot of our immensely popular Gold blog The 10 Algorithms Machine Learning Engineers need to know -- albeit this post is targeted towards beginners. ML algorithms are those that can learn from data and improve from experience, without human intervention. Learning tasks may include learning the function that maps the input to the output, learning the hidden structure in unlabeled data; or'instance-based learning', where a class label is produced for a new instance by comparing the new instance (row) to instances from the training data, which were stored in memory. 'Instance-based learning' does not create an abstraction from specific instances. Supervised learning can be explained as follows: use labeled training data to learn the mapping function from the input variables (X) to the output variable (Y).

artificial intelligence, inductive learning, machine learning, (19 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Opinion Fraud Detection via Neural Autoencoder Decision Forest

Dong, Manqing, Yao, Lina, Wang, Xianzhi, Benatallah, Boualem, Huang, Chaoran, Ning, Xiaodong

arXiv.org Artificial IntelligenceMay-9-2018

Online reviews play an important role in influencing buyers' daily purchase decisions. However, fake and meaningless reviews, which cannot reflect users' genuine purchase experience and opinions, widely exist on the Web and pose great challenges for users to make right choices. Therefore,it is desirable to build a fair model that evaluates the quality of products by distinguishing spamming reviews. We present an end-to-end trainable unified model to leverage the appealing properties from Autoencoder and random forest. A stochastic decision tree model is implemented to guide the global parameter learning process. Extensive experiments were conducted on a large Amazon review dataset. The proposed model consistently outperforms a series of compared methods.

artificial intelligence, machine learning, random forest, (19 more...)

arXiv.org Artificial Intelligence

1805.03379

Country: Oceania > Australia > New South Wales (0.04)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Fraud (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Introduction to Machine Learning for non-developers

#artificialintelligenceMay-8-2018, 02:28:11 GMT

There are several types of predictive models. These models usually have several input columns and one target or outcome column, which is the variable to be predicted. So basically, a model performs mapping between inputs and an output, finding-mysteriously, sometimes-the relationships between the input variables in order to predict any other variable. As you may notice, it has some commonalities with a human being who reads the environment processes the information and performs a certain action. It's about becoming familiar with one of the most-used predictive models: Random Forest (official algorithm site), implemented in R, one of the most-used models due to its simplicity in tuning and robustness across many different types of data.

decision tree learning, machine learning, predictive model, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)

Add feedback

Fighting Accounting Fraud Through Forensic Data Analytics

Jofre, Maria, Gerlach, Richard

arXiv.org Machine LearningMay-8-2018

Accounting fraud is a global concern representing a significant threat to the financial system stability due to the resulting diminishing of the market confidence and trust of regulatory authorities. Several tricks can be used to commit accounting fraud, hence the need for non-static regulatory interventions that take into account different fraudulent patterns. Accordingly, this study aims to improve the detection of accounting fraud via the implementation of several machine learning methods to better differentiate between fraud and non-fraud companies, and to further assist the task of examination within the riskier firms by evaluating relevant financial indicators. Out-of-sample results suggest there is a great potential in detecting falsified financial statements through statistical modelling and analysis of publicly available accounting information. The proposed methodology can be of assistance to public auditors and regulatory agencies as it facilitates auditing processes, and supports more targeted and effective examinations of accounting reports.

accounting fraud, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1805.0284

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Government (1.00)
Banking & Finance > Trading (0.92)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Add feedback

Wavelet Decomposition of Gradient Boosting

Dekel, Shai, Elisha, Oren, Morgan, Ohad

arXiv.org Machine LearningMay-7-2018

In this paper we introduce a significant improvement to the popular tree-based Stochastic Gradient Boosting algorithm using a wavelet decomposition of the trees. This approach is based on harmonic analysis and approximation theoretical elements, and as we show through extensive experimentation, our wavelet based method generally outperforms existing methods, particularly in difficult scenarios of class unbalance and mislabeling in the training data.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1805.02642

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Complete Analysis of a Random Forest Model

Klusowski, Jason M.

arXiv.org Machine LearningMay-7-2018

Random forests have become an important tool for improving accuracy in regression problems since their popularization by (Breiman, 2001) and others. In this paper, we revisit a random forest model originally proposed by (Breiman, 2004) and later studied by (Biau, 2012), where a feature is selected at random and the split occurs at the midpoint of the block containing the chosen feature. If the regression function is sparse and depends only on a small, unknown subset of $ S $ out of $ d $ features, we show that given $ n $ observations, this random forest model outputs a predictor that has a mean-squared prediction error of order $ \left(n\sqrt{\log^{S-1} n}\right)^{-\frac{1}{S\log2+1}} $. When $ S \leq \lfloor 0.72 d \rfloor $, this rate is better than the minimax optimal rate $ n^{-\frac{2}{d+2}} $ for $ d $-dimensional, Lipschitz function classes. As a consequence of our analysis, we show that the variance of the forest decays with the depth of the tree at a rate that is independent of the ambient dimension, even when the trees are fully grown. In particular, if $ \ell_{avg} $ (resp. $ \ell_{max} $) is the average (resp. maximum) number of observations per leaf node, we show that the variance of this forest is $ \Theta\left(\ell^{-1}_{avg}(\sqrt{\log n})^{-(S-1)}\right) $, which for the case of $ S = d $, is similar in form to the lower bound $ \Omega\left(\ell^{-1}_{max}(\log n)^{-(d-1)}\right) $ of (Lin and Jeon, 2006) for any random forest model with a nonadaptive splitting scheme. We also show that the bias is tight for any linear model with nonzero parameter vector. Thus, we completely characterize the fundamental limits of this random forest model. Our new analysis also implies that better theoretical performance can be achieved if the trees are grown less aggressively (i.e., grown to a shallower depth) than previous work would otherwise recommend.

artificial intelligence, machine learning, random forest, (17 more...)

arXiv.org Machine Learning

1805.02587

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Predicting animal adoption with Random Forest, SVM

@machinelearnbotMay-4-2018, 05:35:16 GMT

Joanne Lin, a student at Thinkful's data science bootcamp, decided to jump in and find insights that can help shelters get more pets rescued.

animal adoption, artificial intelligence, decision tree learning, (3 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Comparison of Classical and Nonlinear Models for Short-Term Electricity Price Prediction

Fata, Elaheh, Kadota, Igor, Schneider, Ian

arXiv.org Machine LearningMay-2-2018

Electricity is bought and sold in wholesale markets at prices that fluctuate significantly. Short-term forecasting of electricity prices is an important endeavor because it helps electric utilities control risk and because it influences competitive strategy for generators. As the "smart grid" grows, short-term price forecasts are becoming an important input to bidding and control algorithms for battery operators and demand response aggregators. While the statistics and machine learning literature offers many proposed methods for electricity price prediction, there is no consensus supporting a single best approach. We test two contrasting machine learning approaches for predicting electricity prices, regression decision trees and recurrent neural networks (RNNs), and compare them to a more traditional ARIMA implementation. We conduct the analysis on a challenging dataset of electricity prices from ERCOT, in Texas, where price fluctuation is especially high. We find that regression decision trees in particular achieves high performance compared to the other methods, suggesting that regression trees should be more carefully considered for electricity price forecasting.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

1805.05431

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.27)
North America > United States > Texas (0.25)

Genre: Research Report (0.82)

Industry: Energy > Power Industry > Utilities (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback