AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data

Aghdam, Rosa, Tang, Xudong, Shan, Shan, Lankau, Richard, Solís-Lemus, Claudia

arXiv.org Artificial IntelligenceJun-19-2023

The preservation of soil health has been identified as one of the main challenges of the XXI century given its vast (and potentially threatening) ramifications in agriculture, human health and biodiversity. Here, we provide the first deep investigation of the predictive potential of machine-learning models to understand the connections between soil and biological phenotypes. Indeed, we investigate an integrative framework performing accurate machine-learning-based prediction of plant phenotypes from biological, chemical and physical properties of the soil via two models: random forest and Bayesian neural network. We show that prediction is improved, as evidenced by higher weighted F1 scores, when incorporating into the models environmental features like soil physicochemical properties and microbial population density in addition to the microbiome information. Furthermore, by exploring multiple data preprocessing strategies such as normalization, zero replacement, and data augmentation, we confirm that human decisions have a huge impact on the predictive performance. In particular, we show that the naive total sum scaling normalization that is commonly used in microbiome research is not the optimal strategy to maximize predictive power. In addition, we find that accurately defined labels are more important than normalization, taxonomic level or model characteristics. That is, if humans are unable to classify the samples and provide accurate labels, the performance of machine-learning models will be limited. Lastly, we present strategies for domain scientists via a full model selection decision tree to identify the human choices that maximize the prediction power of the models. Our work is accompanied by open source reproducible scripts (https://github.com/solislemuslab/soil-microbiome-nn) for maximum outreach among the microbiome research community.

artificial intelligence, machine learning, replacement strategy, (16 more...)

arXiv.org Artificial Intelligence

2306.11157

Country:

North America > United States > Wisconsin > Dane County > Madison (0.28)
North America > United States > Minnesota (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine

Blümlein, Theresa, Persson, Joel, Feuerriegel, Stefan

arXiv.org Artificial IntelligenceJun-19-2023

Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.

data mining, machine learning, treatment decision, (18 more...)

arXiv.org Artificial Intelligence

2204.07124

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.95)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Ensemble Framework for Cardiovascular Disease Prediction

Tiwari, Achyut, Chugh, Aryan, Sharma, Aman

arXiv.org Artificial IntelligenceJun-16-2023

Heart disease is the major cause of non-communicable and silent death worldwide. Heart diseases or cardiovascular diseases are classified into four types: coronary heart disease, heart failure, congenital heart disease, and cardiomyopathy. It is vital to diagnose heart disease early and accurately in order to avoid further injury and save patients' lives. As a result, we need a system that can predict cardiovascular disease before it becomes a critical situation. Machine learning has piqued the interest of researchers in the field of medical sciences. For heart disease prediction, researchers implement a variety of machine learning methods and approaches. In this work, to the best of our knowledge, we have used the dataset from IEEE Data Port which is one of the online available largest datasets for cardiovascular diseases individuals. The dataset isa combination of Hungarian, Cleveland, Long Beach VA, Switzerland & Statlog datasets with important features such as Maximum Heart Rate Achieved, Serum Cholesterol, Chest Pain Type, Fasting blood sugar, and so on. To assess the efficacy and strength of the developed model, several performance measures are used, such as ROC, AUC curve, specificity, F1-score, sensitivity, MCC, and accuracy. In this study, we have proposed a framework with a stacked ensemble classifier using several machine learning algorithms including ExtraTrees Classifier, Random Forest, XGBoost, and so on. Our proposed framework attained an accuracy of 92.34% which is higher than the existing literature.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.compbiomed.2022.105624

2306.09989

Country:

Europe > Switzerland (0.24)
North America > United States (0.14)
Asia > Singapore (0.04)
Asia > India > Himachal Pradesh (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development

Mousavirad, Seyed Jalaleddin, Alexandre, Luís A.

arXiv.org Artificial IntelligenceJun-16-2023

Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment. Despite the existence of several energy-efficient programming practices in Android, the leading mobile ecosystem, machine learning-based energy prediction algorithms for mobile App development have yet to be reported. Therefore, this paper proposes a histogram-based gradient boosting classification machine (HGBC), boosted by a metaheuristic approach, for energy prediction in mobile App development. Our metaheuristic approach is responsible for two issues. First, it finds redundant and irrelevant features without any noticeable change in performance. Second, it performs a hyper-parameter tuning for the HGBC algorithm. Since our proposed metaheuristic approach is algorithm-independent, we selected 12 algorithms for the search strategy to find the optimal search algorithm. Our finding shows that a success-history-based parameter adaption for differential evolution with linear population size (L-SHADE) offers the best performance. It can improve performance and decrease the number of features effectively. Our extensive set of experiments clearly shows that our proposed approach can provide significant results for energy consumption prediction.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.09931

Country:

North America > United States (0.14)
Europe > Portugal (0.04)
Africa > Mozambique > Sofala Province > Beira (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(4 more...)

Add feedback

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

Daghero, Francesco, Burrello, Alessio, Macii, Enrico, Montuschi, Paolo, Poncino, Massimo, Pagliari, Daniele Jahier

arXiv.org Artificial IntelligenceJun-16-2023

With the increasing popularity of Internet of Things (IoT) devices, there is a growing need for energy-efficient Machine Learning (ML) models that can run on constrained edge nodes. Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs), are particularly suited for this task, given their relatively low complexity compared to other alternatives. However, their inference time and energy costs are still significant for edge hardware. Given that said costs grow linearly with the ensemble size, this paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input, to trade-off computational cost and accuracy. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code, and exploring several optimizations that account for the available parallelism and memory hierarchy. We extensively benchmark both static and dynamic RFs and GBTs on three state-of-the-art IoT-relevant datasets, using an 8-core ultra-lowpower System-on-Chip (SoC), GAP8, as the target platform. Thanks to the proposed early-stopping mechanisms, we achieve an energy reduction of up to 37.9% with respect to static GBTs (8.82 uJ vs 14.20 uJ per inference) and 41.7% with respect to static RFs (2.86 uJ vs 4.90 uJ per inference), without losing accuracy compared to the static model.

artificial intelligence, ensemble, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JIOT.2023.3286276

2306.09789

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.14)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Semiconductors & Electronics (0.66)
Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Machine-Learned Premise Selection for Lean

Piotrowski, Bartosz, Mir, Ramon Fernández, Ayers, Edward

arXiv.org Artificial IntelligenceJun-14-2023

We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Artificial Intelligence

2304.00994

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.58)

Add feedback

Detection and classification of faults aimed at preventive maintenance of PV systems

Oviedo, Edgar Hernando Sepúlveda, Travé-Massuyès, Louise, Subias, Audine, Pavlov, Marko, Alonso, Corinne

arXiv.org Artificial IntelligenceJun-13-2023

Diagnosis in PV systems aims to detect, locate and identify faults. Diagnosing these faults is vital to guarantee energy production and extend the useful life of PV power plants. In the literature, multiple machine learning approaches have been proposed for this purpose. However, few of these works have paid special attention to the detection of fine faults and the specialized process of extraction and selection of features for their classification. A fine fault is one whose characteristic signature is difficult to distinguish to that of a healthy panel. As a contribution to the detection of fine faults (especially of the snail trail type), this article proposes an innovative approach based on the Random Forest (RF) algorithm. This approach uses a complex feature extraction and selection method that improves the computational time of fault classification while maintaining high accuracy.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08004

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.07)

Genre:

Research Report (0.71)
Overview > Innovation (0.36)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.33)

Add feedback

Interpretable Differencing of Machine Learning Models

Haldar, Swagatam, Saha, Diptikalyan, Wei, Dennis, Nair, Rahul, Daly, Elizabeth M.

arXiv.org Artificial IntelligenceJun-13-2023

Understanding the differences between machine learning (ML) models is of interest in scenarios ranging from choosing amongst a set of competing models, to updating a deployed model with new training data. In these cases, we wish to go beyond differences in overall metrics such as accuracy to identify where in the feature space do the differences occur. We formalize this problem of model differencing as one of predicting a dissimilarity function of two ML models' outputs, subject to the representation of the differences being human-interpretable. Our solution is to learn a Joint Surrogate Tree (JST), which is composed of two conjoined decision tree surrogates for the two models. A JST provides an intuitive representation of differences and places the changes in the context of the models' decision logic. Context is important as it helps users to map differences to an underlying mental model of an AI system. We also propose a refinement procedure to increase the precision of a JST. We demonstrate, through an empirical evaluation, that such contextual differencing is concise and can be achieved with no loss in fidelity over naive approaches.

artificial intelligence, direct dt, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.06473

Country:

North America > United States > New York (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > Canada (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality Based on Decision Trees as Data Observation Processes

Nakahara, Yuta, Saito, Shota, Ichijo, Naoki, Kazama, Koki, Matsushima, Toshiyasu

arXiv.org Artificial IntelligenceJun-12-2023

In the field of decision trees, most previous studies have difficulty ensuring the statistical optimality of a prediction of new data and suffer from overfitting because trees are usually used only to represent prediction functions to be constructed from given data. In contrast, some studies, including this paper, used the trees to represent stochastic data observation processes behind given data. Moreover, they derived the statistically optimal prediction, which is robust against overfitting, based on the Bayesian decision theory by assuming a prior distribution for the trees. However, these studies still have a problem in computing this Bayes optimal prediction because it involves an infeasible summation for all division patterns of a feature space, which is represented by the trees and some parameters. In particular, an open problem is a summation with respect to combinations of division axes, i.e., the assignment of features to inner nodes of the tree. We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.

artificial intelligence, machine learning, proposal distribution, (20 more...)

arXiv.org Artificial Intelligence

2306.0706

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

CARNA: Characterizing Advanced heart failure Risk and hemodyNAmic phenotypes using learned multi-valued decision diagrams

Lamp, Josephine, Wu, Yuxin, Lamp, Steven, Afriyie, Prince, Bilchick, Kenneth, Feng, Lu, Mazimba, Sula

arXiv.org Artificial IntelligenceJun-11-2023

Early identification of high risk heart failure (HF) patients is key to timely allocation of life-saving therapies. Hemodynamic assessments can facilitate risk stratification and enhance understanding of HF trajectories. However, risk assessment for HF is a complex, multi-faceted decision-making process that can be challenging. Previous risk models for HF do not integrate invasive hemodynamics or support missing data, and use statistical methods prone to bias or machine learning methods that are not interpretable. To address these limitations, this paper presents CARNA, a hemodynamic risk stratification and phenotyping framework for advanced HF that takes advantage of the explainability and expressivity of machine learned Multi-Valued Decision Diagrams (MVDDs). This interpretable framework learns risk scores that predict the probability of patient outcomes, and outputs descriptive patient phenotypes (sets of features and thresholds) that characterize each predicted risk score. CARNA incorporates invasive hemodynamics and can make predictions on missing data. The CARNA models were trained and validated using a total of five advanced HF patient cohorts collected from previous trials, and compared with six established HF risk scores and three traditional ML risk models. CARNA provides robust risk stratification, outperforming all previous benchmarks. Although focused on advanced HF, the CARNA framework is general purpose and can be used to learn risk stratifications for other diseases and medical applications.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.06801

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > New York (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
(2 more...)

Add feedback