AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

A survey of bias in Machine Learning through the prism of Statistical Parity for the Adult Data Set

Besse, Philippe, del Barrio, Eustasio, Gordaliza, Paula, Loubes, Jean-Michel, Risser, Laurent

arXiv.org Machine LearningApr-6-2020

Applications based on Machine Learning models have now become an indispensable part of the everyday life and the professional world. A critical question then recently arised among the population: Do algorithmic decisions convey any type of discrimination against specific groups of population or minorities? In this paper, we show the importance of understanding how a bias can be introduced into automatic decisions. We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting. We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set. Finally, we check the performance of different approaches aiming to reduce the bias in binary classification outcomes. Importantly, we show that some intuitive methods are ineffective. This sheds light on the fact trying to make fair machine learning models may be a particularly challenging task, in particular when the training observations contain a bias.

algorithm, discrimination, positive rate, (15 more...)

arXiv.org Machine Learning

2003.14263

Country:

North America > United States > California (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York > New York County > New York City (0.04)
(37 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Information Technology (0.93)
Government (0.93)
Banking & Finance > Credit (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning

Valencia-Zapata, Gustavo A., Ersoy, Okan, Gonzalez-Canas, Carolina, Zentner, Michael G., Klimeck, Gerhard

arXiv.org Artificial IntelligenceApr-6-2020

Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. Most of these approaches focus on remediation of one among many problems, with experimental results coming from few datasets and classification algorithms, insufficient measures of prediction power, and lack of statistical validation for testing the real benefit of the proposed approach. This paper consists of two main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Thereby, early and correct diagnosis of these problems is to be achieved in order to select not only the most convenient remediation treatment but also unbiased performance metrics. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers.

dataset, degradation problem, subclass, (13 more...)

arXiv.org Artificial Intelligence

2004.02988

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Health Care Technology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
(4 more...)

Add feedback

XtracTree for Regulator Validation of Bagging Methods Used in Retail Banking

Charlier, Jeremy, Makarenkov, Vladimir

arXiv.org Artificial IntelligenceApr-5-2020

Bootstrap aggregation, known as bagging, is one of the most popular ensemble methods used in machine learning (ML). An ensemble method is a supervised ML method that combines multiple hypotheses to form a single hypothesis used for prediction. A bagging algorithm combines multiple classifiers modelled on different sub-samples of the same data set to build one large classifier. Large retail banks are nowadays using the power of ML algorithms, including decision trees and random forests, to optimize the retail banking activities. However, AI bank researchers face a strong challenge from their own model validation department as well as from national financial regulators. Each proposed ML model has to be validated and clear rules for every algorithm-based decision have to be established. In this context, we propose XtracTree, an algorithm that is capable of effectively converting an ML bagging classifier, such as a decision tree or a random forest, into simple "if-then" rules satisfying the requirements of model validation. Our algorithm is also capable of highlighting the decision path for each individual sample or a group of samples, addressing any concern from the regulators regarding ML "black-box". We use a public loan data set from Kaggle to illustrate the usefulness of our approach. Our experiments indicate that, using XtracTree, we are able to ensure a better understanding for our model, leading to an easier model validation by national financial regulators and the internal model validation department.

algorithm, classifier, xtractree, (15 more...)

arXiv.org Artificial Intelligence

2004.02326

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > China (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.64)

Industry:

Law > Business Law (0.75)
Banking & Finance > Loans (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

Kerwin, Kathleen, Bastian, Nathaniel D.

arXiv.org Machine LearningApr-3-2020

This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms in step one (by minimizing the error rate of each individual algorithm to reduce its bias in the learning set) and then in step two inputting the results into the meta learner with its stacked blended output (demonstrating improved performance with the weakest algorithms learning better). The method is essentially an enhanced cross-validation strategy. Although the process uses great computational resources, the resulting performance metrics on resampled fraud data show that increased system cost can be justified. A fundamental key to fraud data is that it is inherently not systematic and, as of yet, the optimal resampling methodology has not been identified. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested. Using a comparative analysis on fraud data that applies stacked generalizations provides useful insight needed to find the optimal mathematical formula to be used for imbalanced fraud data sets.

classifier, imbalanced fraud data set, prediction, (9 more...)

arXiv.org Machine Learning

2004.01764

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.14)
North America > United States > Illinois > Cook County > Evanston (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Unpack Local Model Interpretation for GBDT

Fang, Wenjing, Zhou, Jun, Li, Xiaolong, Zhu, Kenny Q.

arXiv.org Machine LearningApr-2-2020

Because GBDT inherits the good performance from its ensemble essence, much attention has been drawn to the optimization of this model. With its popularization, an increasing need for model interpretation arises. Besides the commonly used feature importance as a global interpretation, feature contribution is a local measure that reveals the relationship between a specific instance and the related output. This work focuses on the local interpretation and proposes an unified computation mechanism to get the instance-level feature contributions for GBDT in any version. Practicality of this mechanism is validated by the listed experiments as well as applications in real industry scenarios.

feature contribution, gbdt, interpretation, (15 more...)

arXiv.org Machine Learning

2004.01358

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Machine Learning in GIS: Understand the Theory and Practice

#artificialintelligenceApr-1-2020, 03:43:43 GMT

This course is designed to equip you with the theoretical and practical knowledge of Machine Learning as applied for geospatial analysis, namely Geographic Information Systems (GIS) and Remote Sensing. By the end of the course, you will feel confident and completely understand the Machine Learning applications in GIS technology and how to use Machine Learning algorithms for various geospatial tasks, such as land use and land cover mapping (classifications) and object-based image analysis (segmentation). This course will also prepare you for using GIS with open source and free software tools. In the course, you will be able to apply such Machine Learning algorithms as Random Forest, Support Vector Machines and Decision Trees (and others) for classification of satellite imagery. On top of that, you will practice GIS by completing an entire GIS project by exploring the power of Machine Learning, cloud computing and Big Data analysis using Google Erath Engine for any geographic area in the world.

machine learning, machine learning algorithm, theory and practice, (5 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)

Add feedback

Sequential Feature Classification in the Context of Redundancies

Pfannschmidt, Lukas, Hammer, Barbara

arXiv.org Machine LearningApr-1-2020

The problem of all-relevant feature selection is concerned with finding a relevant feature set with preserved redundancies. There exist several approximations to solve this problem but only one could give a distinction between strong and weak relevance. This approach was limited to the case of linear problems. In this work, we present a new solution for this distinction in the non-linear case through the use of random forest models and statistical methods.

feature selection, importance value, relevant feature, (14 more...)

arXiv.org Machine Learning

2004.00658

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)

Add feedback

SAS and R Integration for Machine Learning

#artificialintelligenceMar-30-2020, 07:35:49 GMT

R first appeared in 1993 and has gained a steady and fiercely loyal fan base. But as data sets become both longer and wider, storage and processing speeds become an issue. Having spent weeks whipping an extremely wide and messy data set into shape using only R, I am so grateful for SAS Viya and not having to go through that again. SAS Viya is a cloud-enabled, in-memory analytics engine which allows for rapid analytics insights. SAS Viya utilizes the SAS Cloud Analytics Services (CAS) to perform various actions and tasks.

action set, machine learning, sas viya, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.37)

Add feedback

Machine Learning Advanced: Decision Trees in Python

#artificialintelligenceMar-28-2020, 17:20:25 GMT

Free Course - Machine Learning Advanced: Decision Trees in Python [2020] Use Decision Trees to solve business problems and build high accuracy prediction models in Python, Learn how to use decision trees to make predictions for business problems using python. Start with this advanced machine learning tutorial today! Instructor: Start Tes Enroll Now - Machine Learning Advanced: Decision Trees in Python About this Course The course is created on the basis of three pillars of learning: Know (Study) Do (Practice) Review (Self feedback) Know We have created a set of concise and comprehensive videos to teach you all the Excel related skills you will need in your professional career. Add To Cart - GET COUPON CODE Do With each lecture, we have provide a practice sheet to complement the learning in the lecture video. These sheets are carefully designed to further clarify the concepts and help you with implementing the concepts on practical problems faced on-the-job.

decision tree, machine learning advanced, python, (3 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

From unbiased MDI Feature Importance to Explainable AI for Trees

Loecher, Markus

arXiv.org Machine LearningMar-26-2020

We attempt to give a unifying view of the various recent attempts to (i) improve the interpretability of tree-based models and (ii) debias the the default variable-importance measure in random Forests, Gini importance. In particular, we demonstrate a common thread among the out-of-bag based bias correction methods and their connection to local explanation for trees. In addition, we point out a bias caused by the inclusion of inbag data in the newly developed explainable AI for trees algorithms.

contribution, feature contribution, oob, (12 more...)

arXiv.org Machine Learning

2003.12043

Country:

Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.61)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.61)

Add feedback