AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

3 decision tree-based algorithms for Machine Learning

#artificialintelligenceNov-18-2020, 01:31:33 GMT

Decision trees are a tree algorithm that split the data based on certain decisions. Look at the image below of a very simple decision tree. We want to decide if an animal is a cat or a dog based on 2 questions. We can answer each question and depending on the answer, we can classify the animal as either a dog or a cat. The red lines represent the answer "NO" and the green line, "YES".

decision tree, leaf node, prediction, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.90)

Add feedback

Decision Trees, Random Forests, AdaBoost & XGBoost in Python

#artificialintelligenceNov-17-2020, 14:48:45 GMT

You're looking for a complete Decision tree course that teaches you everything you need to create a Decision tree/ Random Forest/ XGBoost model in Python, right? You've found the right Decision Trees and tree based advanced techniques course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning advanced course. If you are a business manager or an executive, or a student who wants to learn and apply machine learning in Real world problems of business, this course will give you a solid base for that by teaching you some of the advanced technique of machine learning, which are Decision tree, Random Forest, Bagging, AdaBoost and XGBoost.

decision tree, machine learning, python, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Regression Trees for Cumulative Incidence Functions

#artificialintelligenceNov-16-2020, 23:11:56 GMT

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun only recently. In this paper, we develop a novel approach to building regression trees for estimating cumulative incidence curves in a competing risks setting. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees.

cumulative incidence function, regression tree

#artificialintelligence

Industry: Health & Medicine (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

A Survey on the Explainability of Supervised Machine Learning

Burkart, Nadia, Huber, Marco F.

arXiv.org Machine LearningNov-16-2020

Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or fifinance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions.

explanation, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2011.07876

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Health & Medicine > Therapeutic Area (0.93)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(9 more...)

Add feedback

Precision-Recall Curve (PRC) Classification Trees

Miao, Jiaju, Zhu, Wei

arXiv.org Machine LearningNov-15-2020

The classification of imbalanced data has presented a significant challenge for most well-known classification algorithms that were often designed for data with relatively balanced class distributions. Nevertheless skewed class distribution is a common feature in real world problems. It is especially prevalent in certain application domains with great need for machine learning and better predictive analysis such as disease diagnosis, fraud detection, bankruptcy prediction, and suspect identification. In this paper, we propose a novel tree-based algorithm based on the area under the precision-recall curve (AUPRC) for variable selection in the classification context. Our algorithm, named as the "Precision-Recall Curve classification tree", or simply the "PRC classification tree" modifies two crucial stages in tree building. The first stage is to maximize the area under the precision-recall curve in node variable selection. The second stage is to maximize the harmonic mean of recall and precision (F-measure) for threshold selection. We found the proposed PRC classification tree, and its subsequent extension, the PRC random forest, work well especially for class-imbalanced data sets. We have demonstrated that our methods outperform their classic counterparts, the usual CART and random forest for both synthetic and real data. Furthermore, the ROC classification tree proposed by our group previously has shown good performance in imbalanced data. The combination of them, the PRC-ROC tree, also shows great promise in identifying the minority class.

algorithm, classifier, random forest, (14 more...)

arXiv.org Machine Learning

2011.0764

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.05)
North America > United States > Wisconsin (0.04)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.47)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Regression Trees for Cumulative Incidence Functions

Cho, Youngjoo, Molinaro, Annette M., Hu, Chen, Strawderman, Robert L.

arXiv.org Machine LearningNov-12-2020

A subject being followed over time may experience several types of events related, for example, to disease morbidity and mortality. For example, in a Phase III trial of concomitant versus sequential chemotherapy and thoracic radiotherapy for patients with inoperable non-small cell lung cancer (NSCLC) conducted by the Radiation Therapy Oncology Group (RTOG), patients were followed up to 5 years, the occurrence of either disease progression or death being of particular interest. Such "competing risks" data are commonly encountered in cancer and other biomedical followup studies, in addition to the potential complication of right-censoring on the event time(s) of interest. Two quantities are often used when analyzing competing risks data: the cause-specific hazard function (CSH) and the cumulative incidence function (CIF). For a given event, the former describes the instantaneous risk of this event at time t, given that no events have yet occurred; the latter describes the probability of occurrence, or absolute risk, of that event across time and can be derived directly from the subdistribution hazard function (Fine and Gray, 1999).

ipcw 2, loss function, npar, (14 more...)

arXiv.org Machine Learning

2011.06706

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > California > Monterey County > Monterey (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Interpretable Machine Learning for COVID-19: An Empirical Study on Severity Prediction Task

Wu, Han, Ruan, Wenjie, Wang, Jiangtao, Zheng, Dingchang, Li, Shaolin, Chen, Jian, Li, Kunwei, Chai, Xiangfei, Helal, Sumi

arXiv.org Machine LearningNov-12-2020

Black-box nature hinders the deployment of many high-accuracy models in medical diagnosis. It is risky to put one's life in the hands of models that medical researchers do not trust. However, to understand the mechanism of a new virus, such as COVID-19, machine learning models may catch important symptoms that medical practitioners do not notice due to the surge of infected patients during a pandemic. In this work, the interpretation of machine learning models reveals that a high C-reactive protein (CRP) corresponds to severe infection, and severe patients usually go through a cardiac injury, which is consistent with well-established medical knowledge. Additionally, through the interpretation of machine learning models, we find phlegm and diarrhea are two important symptoms, without which indicate a high risk of turning severe. These two symptoms are not recognized at the early stage of the outbreak, whereas our findings are corroborated by later autopsies of COVID-19 patients. We find patients with a high N-terminal pro B-type natriuretic peptide (NTproBNP) have a significantly increased risk of death which does not receive much attention initially but proves true by the following-up study. Thus, we suggest interpreting machine learning models can offer help to diagnosis at the early stage of an outbreak.

covid-19, explanation, prediction, (13 more...)

arXiv.org Machine Learning

2010.02006

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Asia > China > Guangdong Province > Zhuhai (0.05)
Europe > United Kingdom (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.33)

Add feedback

An Embedded Model Estimator for Non-Stationary Random Functions using Multiple Secondary Variables

Daly, Colin

arXiv.org Machine LearningNov-10-2020

An algorithm for non-stationary spatial modelling using multiple secondary variables is developed. It combines Geostatistics with Quantile Random Forests to give a new interpolation and stochastic simulation algorithm. This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests. The method allows for embedding of simpler interpolation techniques, such as Kriging, to further condition the model. The algorithm works by estimating a conditional distribution for the target variable at each target location. The family of such distributions is called the envelope of the target variable. From this, it is possible to obtain spatial estimates, quantiles and uncertainty. An algorithm to produce conditional simulations from the envelope is also developed. As they sample from the envelope, realizations are therefore locally influenced by relative changes of importance of secondary variables, trends and variability.

decision tree learning, simulation, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2011.04116

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

The Macroeconomy as a Random Forest

Coulombe, Philippe Goulet

arXiv.org Machine LearningNov-8-2020

I develop Macroeconomic Random Forest (MRF), an algorithm adapting the canonical Machine Learning (ML) tool to flexibly model evolving parameters in a linear macro equation. Its main output, Generalized Time-Varying Parameters (GTVPs), is a versatile device nesting many popular nonlinearities (threshold/switching, smooth transition, structural breaks/change) and allowing for sophisticated new ones. The approach delivers clear forecasting gains over numerous alternatives, predicts the 2008 drastic rise in unemployment, and performs well for inflation. Unlike most ML-based methods, MRF is directly interpretable -- via its GTVPs. For instance, the successful unemployment forecast is due to the influence of forward-looking variables (e.g., term spreads, housing starts) nearly doubling before every recession. Interestingly, the Phillips curve has indeed flattened, and its might is highly cyclical.

forecast, inflation, recession, (16 more...)

arXiv.org Machine Learning

2006.12724

Country:

Europe > Netherlands (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Enhash: A Fast Streaming Algorithm For Concept Drift Detection

Jindal, Aashi, Gupta, Prashant, Sengupta, Debarka, Jayadeva, null

arXiv.org Machine LearningNov-7-2020

We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, Enhash has moderate resource requirements. Experiments relevant to performance comparison were performed on 6 artificial and 4 real data sets consisting of various types of drifts.

concept drift, enhash, fast streaming algorithm, (12 more...)

arXiv.org Machine Learning

2011.03729

Country: Asia > India (0.05)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback