AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Guide To Ensemble Methods: Bagging vs Boosting

#artificialintelligenceDec-3-2020, 04:51:04 GMT

Building a highly accurate prediction model is certainly a difficult task. Noise – Irreducible error i.e. the part of target value which the model is not able to predict / explain. As you know it is impossible to reduce the noise, hence the term irreducible error, we shift our focus on reducing Bias and Variance. So, Ensemble learning methods bring up the technique to reduce the Bias and Variance of the model by using multiple models together (hence the term Ensemble), in order to achieve better predictive performance, instead of a single model for prediction. There are a number of Ensemble methods, in this article I will be discussing about the two widely used Ensemble methods that are Bagging and Boosting. When we use different / single learning algorithm, multiple times for prediction.

algorithm, classifier, prediction, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Add feedback

Human vs. supervised machine learning: Who learns patterns faster?

Kühl, Niklas, Goutier, Marc, Baier, Lucas, Wolff, Clemens, Martin, Dominik

arXiv.org Artificial IntelligenceNov-30-2020

The capabilities of supervised machine learning (SML), especially compared to human abilities, are being discussed in scientific research and in the usage of SML. This study provides an answer to how learning performance differs between humans and machines when there is limited training data. We have designed an experiment in which 44 humans and three different machine learning algorithms identify patterns in labeled training data and have to label instances according to the patterns they find. The results show a high dependency between performance and the underlying patterns of the task. Whereas humans perform relatively similarly across all patterns, machines show large performance differences for the various patterns in our experiment. After seeing 20 instances in the experiment, human performance does not improve anymore, which we relate to theories of cognitive overload. Machines learn slower but can reach the same level or may even outperform humans in 2 of the 4 of used patterns. However, machines need more instances compared to humans for the same results. The performance of machines is comparably lower for the other 2 patterns due to the difficulty of combining input features.

experiment, learning, supervised machine, (17 more...)

arXiv.org Artificial Intelligence

2012.03661

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Education (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Data Preprocessing to Mitigate Bias with Boosted Fair Mollifiers

Soen, Alexander, Husain, Hisham, Nock, Richard

arXiv.org Machine LearningNov-30-2020

In a recent paper, Celis et al. (2020) introduced a new approach to fairness that corrects the data distribution itself. The approach is computationally appealing, but its approximation guarantees with respect to the target distribution can be quite loose as they need to rely on a (typically limited) number of constraints on data-based aggregated statistics; also resulting on a fairness guarantee which can be data dependent. Our paper makes use of a mathematical object recently introduced in privacy -- mollifiers of distributions -- and a popular approach to machine learning -- boosting -- to get an approach in the same lineage as Celis et al. but without those impediments, including in particular, better guarantees in terms of accuracy and finer guarantees in terms of fairness. The approach involves learning the sufficient statistics of an exponential family. When training data is tabular, it is defined by decision trees whose interpretability can provide clues on the source of (un)fairness. Experiments display the quality of the results obtained for simulated and real-world data.

boosted fair mollifier, fairness, representation rate, (11 more...)

arXiv.org Machine Learning

2012.00188

Country:

North America > United States > Florida > Broward County (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)

Add feedback

Python Decision Tree Guide: Make a Decision Tree Using Python

#artificialintelligenceNov-27-2020, 23:41:02 GMT

Creating a decision tree in Python is a topic that raises a lot of questions for a beginner. What exactly is it, and what do we use it for? Where do we start building one, and what first steps do we take? Why do we use Python? Let's begin at the top. Simply put, a Python decision tree is a machine-learning method that we use for classification.

decision tree, python, python decision tree, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Decision Trees in Machine Learning (ML) with Python Tutorial

#artificialintelligenceNov-27-2020, 20:16:12 GMT

This tutorial's code is available on Github and its full implementation as well on Google Colab. A decision tree is a vital and popular tool for classification and prediction problems in machine learning, statistics, data mining, and machine learning [4]. It describes rules that can be interpreted by humans and applied in a knowledge system such as databases. It classifies cases by commencing at the tree's root and passing through it unto a leaf node. A decision tree uses nodes and leaves to make a decision.

decision tree, entropy, impurity, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Conditional canonical correlation estimation based on covariates with random forests

Alakus, Cansu, Larocque, Denis, Jacquemont, Sebastien, Barlaam, Fanny, Martin, Charles-Olivier, Agbogba, Kristian, Lippe, Sarah, Labbe, Aurelie

arXiv.org Machine LearningNov-23-2020

Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender, or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data.

canonical correlation, correlation, covariate, (15 more...)

arXiv.org Machine Learning

2011.11555

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Meta-Learning for Time Series Forecasting Ensemble

Vaiciukynas, Evaldas, Danenas, Paulius, Kontrimas, Vilius, Butleris, Rimantas

arXiv.org Machine LearningNov-20-2020

Amounts of historical data collected increase together with business intelligence applicability and demands for automatic forecasting of time series. While no single time series modeling method is universal to all types of dynamics, forecasting using ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size we propose to adaptively predict these aspects using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time series features, to rank 22 univariate forecasting methods and to recommend ensemble size. Forecasting ensemble is consequently formed from methods ranked as the best and forecasts are pooled using either simple or weighted average (with weight corresponding to reciprocal rank). Proposed approach was tested on 12561 micro-economic time series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using Theta method.

forecast, forecasting, time sery, (14 more...)

arXiv.org Machine Learning

2011.10545

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Europe > Lithuania > Kaunas County > Kaunas (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)

Add feedback

From Decision Trees and Random Forests to Gradient Boosting

#artificialintelligenceNov-19-2020, 11:50:36 GMT

Suppose we wish to perform supervised learning on a classification problem to determine if an incoming email is spam or not spam. The spam dataset consists of 4601 emails, each labelled as real (or not spam) (0) or spam (1). The data also contains a large number of predictors (57), each of which is either a character count, or a frequency of occurrence of a certain word or symbol. In this short article, we will briefly cover the main concepts in tree based classification and compare and contrast the most popular methods. This dataset and several worked examples are covered in detail in The Elements of Statistical Learning, II edition.

decision tree and random forest, frequency, spam, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)

Add feedback

Interpretability, Explainability, and Machine Learning

#artificialintelligenceNov-19-2020, 01:26:58 GMT

Susan will present, "Understanding and Addressing Bias in Analytics" at CONVERGE, December 1-2. This article was originally published on KDnuggets. I use one of those credit monitoring services that regularly emails me about my credit score: "Congratulations, your score has gone up!" "Uh oh, your score has gone down! I shrug and delete the emails. Credit scores are just one example of the many automated decisions made about us as individuals on the basis of complex models.

explainability, interpretability, prediction, (14 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (0.70)
Banking & Finance (0.56)
Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)

Add feedback

How to Future-Proof Your Data Science Project - KDnuggets

#artificialintelligenceNov-18-2020, 13:25:10 GMT

Nontechnical stakeholders struggle to define business requirements. Crossfunctional teams face an uphill battle to set up robust pipelines for replicable data delivery. Machine learning models can take on a life of their own. If you've been ignoring these critical elements in the past, you may find your deployment rate skyrockets. Your data products may depend on correctly deploying the tips from this article.

data product, data scientist, neural network, (13 more...)

#artificialintelligence

Country:

North America > United States > Wisconsin (0.05)
North America > United States > New York (0.05)
North America > United States > California (0.05)

Industry: Law (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)

Add feedback