AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

A review on longitudinal data analysis with random forest in precision medicine

Hu, Jianchang, Szymczak, Silke

arXiv.org Artificial IntelligenceAug-8-2022

Precision medicine provides customized treatments to patients based on their characteristics and is a promising approach to improving treatment efficiency. Large scale omics data are useful for patient characterization, but often their measurements change over time, leading to longitudinal data. Random forest is one of the state-of-the-art machine learning methods for building prediction models, and can play a crucial role in precision medicine. In this paper, we review extensions of the standard random forest method for the purpose of longitudinal data analysis. Extension methods are categorized according to the data structures for which they are designed. We consider both univariate and multivariate responses and further categorize the repeated measurements according to whether the time effect is relevant. Information of available software implementations of the reviewed extensions is also given. We conclude with discussions on the limitations of our review and some future research directions.

artificial intelligence, longitudinal data, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/bib/bbad002

2208.04112

Country:

Europe > Germany (0.14)
North America > United States > Montana (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Machine Learning and Bioinformatics for Diagnosis Analysis of Obesity Spectrum Disorders

Gasmi, Amin

arXiv.org Machine LearningAug-5-2022

Globally, the number of obese patients has doubled due to sedentary lifestyles and improper dieting. The tremendous increase altered human genetics, and health. According to the world health organization, Life expectancy dropped from 80 to 75 years, as obese people struggle with different chronic diseases. This report will address the problems of obesity in children and adults using ML datasets to feature, predict, and analyze the causes of obesity. By engaging neural ML networks, we will explore neural control using diffusion tensor imaging to consider body fats, BMI, waist \& hip ratio circumference of obese patients. To predict the present and future causes of obesity with ML, we will discuss ML techniques like decision trees, SVM, RF, GBM, LASSO, BN, and ANN and use datasets implement the stated algorithms. Different theoretical literature from experts ML \& Bioinformatics experiments will be outlined in this report while making recommendations on how to advance ML for predicting obesity and other chronic diseases.

artificial intelligence, bioinformatics, machine learning, (18 more...)

arXiv.org Machine Learning

2208.03139

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(5 more...)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

The New Machine Learning Specialization : in-depth review

#artificialintelligenceAug-3-2022, 09:10:19 GMT

The lectures starts with defining the decision trees, the splitting criteria,and different uses of the tree like applying the algorithm to categorial features, splitting on continuous features,or using the trees for regression problems, then it explains combining multiple trees and using Ensemble Learning to apply Random Forest, in the last lecture we take a glimpse of XGBoost and how to use them, without any more details. This is probably the most hyped part of the whole specialization, I found many people celebrating that this introductory course will discuss such topics.

in-depth review, new machine learning specialization, state-action value function

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

Automated fault tree learning from continuous-valued sensor data: a case study on domestic heaters

Verkuil, Bart, Budde, Carlos E., Bucur, Doina

arXiv.org Artificial IntelligenceAug-3-2022

Many industrial sectors have been collecting big sensor data. With recent technologies for processing big data, companies can exploit this for automatic failure detection and prevention. We propose the first completely automated method for failure analysis, machine-learning fault trees from raw observational data with continuous variables. Our method scales well and is tested on a real-world, five-year dataset of domestic heater operations in The Netherlands, with 31 million unique heater-day readings, each containing 27 sensor and 11 failure variables. Our method builds on two previous procedures: the C4.5 decision-tree learning algorithm, and the LIFT fault tree learning algorithm from Boolean data. C4.5 pre-processes each continuous variable: it learns an optimal numerical threshold which distinguishes between faulty and normal operation of the top-level system. These thresholds discretise the variables, thus allowing LIFT to learn fault trees which model the root failure mechanisms of the system and are explainable. We obtain fault trees for the 11 failure variables, and evaluate them in two ways: quantitatively, with a significance score, and qualitatively, with domain specialists. Some of the fault trees learnt have almost maximum significance (above 0.95), while others have medium-to-low significance (around 0.30), reflecting the difficulty of learning from big, noisy, real-world sensor data. The domain specialists confirm that the fault trees model meaningful relationships among the variables.

fault tree, sensor variable, temp, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.36001/ijphm.2022.v13i2.3160

2203.07374

Country:

Europe > Netherlands (0.25)
North America > United States > California > Santa Clara County > San Jose (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.68)
Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Machine Learning Pipelines

#artificialintelligenceAug-2-2022, 15:19:31 GMT

In this use case, we will be using the Titanic dataset. In this dataset, we will apply some common Transformers on certain columns and then we will use a Decision Tree Estimator to classify whether the passenger will live or die. Here is the plan outline for our use case. To make our use case easy to understand, let us see the diagram below. This will give you a fairly good understanding of the pipeline visually.

machine learning pipeline, pipeline, transformer, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.37)

Add feedback

Accelerated and interpretable oblique random survival forests

Jaeger, Byron C., Welden, Sawyer, Lenoir, Kristin, Speiser, Jaime L., Segar, Matthew W., Pandey, Ambarish, Pajewski, Nicholas M.

arXiv.org Machine LearningAug-1-2022

The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors to create branches, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles often have higher prediction accuracy than standard RSF ensembles. However, assessing all possible linear combinations of predictors induces significant computational overhead that limits applications to large-scale data sets. In addition, few methods have been developed for interpretation of oblique RSF ensembles, and they remain more difficult to interpret compared to their axis-based counterparts. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate importance of individual predictor variables with the oblique RSF. Our strategy to reduce computational overhead makes use of Newton-Raphson scoring, a classical optimization technique that we apply to the Cox partial likelihood function within each non-leaf node of decision trees. We estimate the importance of individual predictors for the oblique RSF by negating each coefficient used for the given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In general benchmarking experiments, we find that our implementation of the oblique RSF is approximately 450 times faster with equivalent discrimination and superior Brier score compared to existing software for oblique RSFs. We find in simulation studies that 'negation importance' discriminates between relevant and irrelevant predictors more reliably than permutation importance, Shapley additive explanations, and a previously introduced technique to measure variable importance with oblique RSFs based on analysis of variance. Methods introduced in the current study are available in the aorsf R package.

data mining, machine learning, oblique rsf, (13 more...)

arXiv.org Machine Learning

2208.01129

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)

Add feedback

Estimating a Book's Publication Date with Artificial Intelligence

#artificialintelligenceJul-31-2022, 18:10:08 GMT

You're probably aware of AI's increasing ability to analyze and synthesize human language, such as the recent controversy over whether a Google chatbot is, in fact, sentient (Google claims -- and I'm inclined to believe -- that the chatbot is just very, very good at recognizing and replicating speech patterns). Since AI is so skilled at analyzing language, I wondered whether it could detect changes in language over time. Could it differentiate between texts written in, say, the 12th century and the 18th century? As it turns out, it can! To build this model, I used natural language processing, the branch of machine learning dedicated to (you guessed it!)

decision tree, vector, word model, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.54)

Add feedback

ANOVA-based Automatic Attribute Selection and a Predictive Model for Heart Disease Prognosis

Chowdhury, Mohammed Nowshad Ruhani, Zhang, Wandong, Akilan, Thangarajah

arXiv.org Artificial IntelligenceJul-30-2022

Studies show that Studies that cardiovascular diseases (CVDs) are malignant for human health. Thus, it is important to have an efficient way of CVD prognosis. In response to this, the healthcare industry has adopted machine learning-based smart solutions to alleviate the manual process of CVD prognosis. Thus, this work proposes an information fusion technique that combines key attributes of a person through analysis of variance (ANOVA) and domain experts' knowledge. It also introduces a new collection of CVD data samples for emerging research. There are thirty-eight experiments conducted exhaustively to verify the performance of the proposed framework on four publicly available benchmark datasets and the newly created dataset in this work. The ablation study shows that the proposed approach can achieve a competitive mean average accuracy (mAA) of 99.2% and a mean average AUC of 97.9%.

classifier, dataset, feature selection, (16 more...)

arXiv.org Artificial Intelligence

2208.00296

Country:

Europe > Switzerland (0.06)
Asia > Bangladesh > Sylhet Division > Sylhet District > Sylhet (0.04)
North America > Canada > Ontario > Thunder Bay (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
(4 more...)

Add feedback

SHAP for additively modeled features in a boosted trees model

Mayer, Michael

arXiv.org Artificial IntelligenceJul-29-2022

An important technique to explore a black-box machine learning (ML) model is called SHAP (SHapley Additive exPlanation). SHAP values decompose predictions into contributions of the features in a fair way. We will show that for a boosted trees model with some or all features being additively modeled, the SHAP dependence plot of such a feature corresponds to its partial dependence plot up to a vertical shift. We illustrate the result with XGBoost.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.1449

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)

Add feedback

Classification of FIB/SEM-tomography images for highly porous multiphase materials using random forest classifiers

Osenberg, Markus, Hilger, André, Neumann, Matthias, Wagner, Amalia, Bohn, Nicole, Binder, Joachim R., Schmidt, Volker, Banhart, John, Manke, Ingo

arXiv.org Artificial IntelligenceJul-28-2022

FIB/SEM tomography represents an indispensable tool for the characterization of three-dimensional nanostructures in battery research and many other fields. However, contrast and 3D classification/reconstruction problems occur in many cases, which strongly limits the applicability of the technique especially on porous materials, like those used for electrode materials in batteries or fuel cells. Distinguishing the different components like active Li storage particles and carbon/binder materials is difficult and often prevents a reliable quantitative analysis of image data, or may even lead to wrong conclusions about structure-property relationships. In this contribution, we present a novel approach for data classification in three-dimensional image data obtained by FIB/SEM tomography and its applications to NMC battery electrode materials. We use two different image signals, namely the signal of the angled SE2 chamber detector and the Inlens detector signal, combine both signals and train a random forest, i.e. a particular machine learning algorithm. We demonstrate that this approach can overcome current limitations of existing techniques suitable for multi-phase measurements and that it allows for quantitative data reconstruction even where current state-of the art techniques fail, or demand for large training sets. This approach may yield as guideline for future research using FIB/SEM tomography.

artificial intelligence, classification, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.14114

Country: Europe > Germany (0.46)

Genre: Research Report > Promising Solution (0.54)

Industry:

Energy > Energy Storage (1.00)
Energy > Oil & Gas > Upstream (0.98)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

Add feedback