AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Assessing Alcohol Use Disorder: Insights from Lifestyle, Background, and Family History with Machine Learning Techniques

Wang, Chenlan, Huang, Gaojian, Luo, Yue

arXiv.org Artificial IntelligenceOct-23-2024

This study explored how lifestyle, personal background, and family history contribute to the risk of developing Alcohol Use Disorder (AUD). Survey data from the All of Us Program was utilized to extract information on AUD status, lifestyle, personal background, and family history for 6,016 participants. Key determinants of AUD were identified using decision trees including annual income, recreational drug use, length of residence, sex/gender, marital status, education level, and family history of AUD. Data visualization and Chi-Square Tests of Independence were then used to assess associations between identified factors and AUD. Afterwards, machine learning techniques including decision trees, random forests, and Naive Bayes were applied to predict an individual's likelihood of developing AUD. Random forests were found to achieve the highest accuracy (82%), compared to Decision Trees and Naive Bayes. Findings from this study can offer insights that help parents, healthcare professionals, and educators develop strategies to reduce AUD risk, enabling early intervention and targeted prevention efforts.

artificial intelligence, decision tree learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2410.18354

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $\gamma_j$

Chen, Hung-Hsuan

arXiv.org Artificial IntelligenceOct-23-2024

The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC's training and prediction processes, focusing on the computation of terminal node values $\gamma_j$, which are crucial to optimizing the logistic loss function. We derive $\gamma_j$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm's implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2410.05623

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)

Add feedback

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

Azeem, Muqsit, Chakraborty, Debraj, Kanav, Sudeep, Kretinsky, Jan, Mohagheghi, Mohammadsadegh, Mohr, Stefanie, Weininger, Maximilian

arXiv.org Artificial IntelligenceOct-23-2024

Despite the advances in probabilistic model checking, the scalability of the verification methods remains limited. In particular, the state space often becomes extremely large when instantiating parameterized Markov decision processes (MDPs) even with moderate values. Synthesizing policies for such \emph{huge} MDPs is beyond the reach of available tools. We propose a learning-based approach to obtain a reasonable policy for such huge MDPs. The idea is to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning. Consequently, our method bypasses the need for explicit state-space exploration of large models, providing a practical solution to the state-space explosion problem. We demonstrate the efficacy of our approach by performing extensive experimentation on the relevant models from the quantitative verification benchmark set. The experimental results indicate that our policies perform well, even when the size of the model is orders of magnitude beyond the reach of state-of-the-art analysis tools.

artificial intelligence, machine learning, min 0, (18 more...)

arXiv.org Artificial Intelligence

2410.18293

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Vienna (0.14)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

Chen, Jingdi, Zhou, Hanhan, Mei, Yongsheng, Joe-Wong, Carlee, Adam, Gina, Bastian, Nathaniel D., Lan, Tian

arXiv.org Artificial IntelligenceOct-21-2024

Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).

machine learning, node, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2410.16517

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Interpreting Microbiome Relative Abundance Data Using Symbolic Regression

Haldar, Swagatam, Stein-Thoeringer, Christoph, Borisov, Vadim

arXiv.org Artificial IntelligenceOct-18-2024

Understanding the complex interactions within the microbiome is crucial for developing effective diagnostic and therapeutic strategies. Traditional machine learning models often lack interpretability, which is essential for clinical and biological insights. This paper explores the application of symbolic regression (SR) to microbiome relative abundance data, with a focus on colorectal cancer (CRC). SR, known for its high interpretability, is compared against traditional machine learning models, e.g., random forest, gradient boosting decision trees. These models are evaluated based on performance metrics such as F1 score and accuracy. We utilize 71 studies encompassing, from various cohorts, over 10,000 samples across 749 species features. Our results indicate that SR not only competes reasonably well in terms of predictive performance, but also excels in model interpretability. SR provides explicit mathematical expressions that offer insights into the biological relationships within the microbiome, a crucial advantage for clinical and biological interpretation. Our experiments also show that SR can help understand complex models like XGBoost via knowledge distillation. To aid in reproducibility and further research, we have made the code openly available at https://github.com/swag2198/microbiome-symbolic-regression .

artificial intelligence, machine learning, microbiome, (13 more...)

arXiv.org Artificial Intelligence

2410.16109

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Oklahoma (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.89)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)

Add feedback

An explainable machine learning approach for energy forecasting at the household level

Béraud, Pauline, Rioux, Margaux, Babany, Michel, de La Chevasnerie, Philippe, Theis, Damien, Teodori, Giacomo, Pinguet, Chloé, Rigaud, Romane, Leclerc, François

arXiv.org Artificial IntelligenceOct-18-2024

Electricity forecasting has been a recurring research topic, as it is key to finding the right balance between production and consumption. While most papers are focused on the national or regional scale, few are interested in the household level. Desegregated forecast is a common topic in Machine Learning (ML) literature but lacks explainability that household energy forecasts require. This paper specifically targets the challenges of forecasting electricity use at the household level. This paper confronts common Machine Learning algorithms to electricity household forecasts, weighing the pros and cons, including accuracy and explainability with well-known key metrics. Furthermore, we also confront them in this paper with the business challenges specific to this sector such as explainability or outliers resistance. We introduce a custom decision tree, aiming at providing a fair estimate of the energy consumption, while being explainable and consistent with human intuition. We show that this novel method allows greater explainability without sacrificing much accuracy. The custom tree methodology can be used in various business use cases but is subject to limitations, such as a lack of resilience with outliers.

artificial intelligence, decision tree learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.14416

Country: Europe > France (0.05)

Genre: Research Report (0.70)

Industry: Energy > Power Industry (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.54)

Add feedback

Explainable Moral Values: a neuro-symbolic approach to value classification

Lazzari, Nicolas, De Giorgis, Stefano, Gangemi, Aldo, Presutti, Valentina

arXiv.org Artificial IntelligenceOct-16-2024

This work explores the integration of ontology-based reasoning and Machine Learning techniques for explainable value classification. By relying on an ontological formalization of moral values as in the Moral Foundations Theory, relying on the DnS Ontology Design Pattern, the \textit{sandra} neuro-symbolic reasoner is used to infer values (fomalized as descriptions) that are \emph{satisfied by} a certain sentence. Sentences, alongside their structured representation, are automatically generated using an open-source Large Language Model. The inferred descriptions are used to automatically detect the value associated with a sentence. We show that only relying on the reasoner's inference results in explainable classification comparable to other more complex approaches. We show that combining the reasoner's inferences with distributional semantics methods largely outperforms all the baselines, including complex models based on neural network architectures. Finally, we build a visualization tool to explore the potential of theory-based values classification, which is publicly available at http://xmv.geomeaning.com/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.12631

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.77)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(3 more...)

Add feedback

Global Censored Quantile Random Forest

Zhou, Siyu, Peng, Limin

arXiv.org Machine LearningOct-16-2024

In recent years, censored quantile regression has enjoyed an increasing popularity for survival analysis while many existing works rely on linearity assumptions. In this work, we propose a Global Censored Quantile Random Forest (GCQRF) for predicting a conditional quantile process on data subject to right censoring, a forest-based flexible, competitive method able to capture complex nonlinear relationships. Taking into account the randomness in trees and connecting the proposed method to a randomized incomplete infinite degree U-process (IDUP), we quantify the prediction process' variation without assuming an infinite forest and establish its weak convergence. Moreover, feature importance ranking measures based on out-of-sample predictive accuracy are proposed. We demonstrate the superior predictive accuracy of the proposed method over a number of existing alternatives and illustrate the use of the proposed importance ranking measures on both simulated and real data.

gcqrf, quantile, random forest, (14 more...)

arXiv.org Machine Learning

2410.12209

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

Add feedback

Conditional Density Estimation with Histogram Trees

Yang, Lincen, van Leeuwen, Matthijs

arXiv.org Artificial IntelligenceOct-15-2024

Conditional density estimation (CDE) goes beyond regression by modeling the full conditional distribution, providing a richer understanding of the data than just the conditional mean in regression. This makes CDE particularly useful in critical application domains. However, interpretable CDE methods are understudied. Current methods typically employ kernel-based approaches, using kernel functions directly for kernel density estimation or as basis functions in linear models. In contrast, despite their conceptual simplicity and visualization suitability, tree-based methods -- which are arguably more comprehensible -- have been largely overlooked for CDE tasks. Thus, we propose the Conditional Density Tree (CDTree), a fully non-parametric model consisting of a decision tree in which each leaf is formed by a histogram model. Specifically, we formalize the problem of learning a CDTree using the minimum description length (MDL) principle, which eliminates the need for tuning the hyperparameter for regularization. Next, we propose an iterative algorithm that, although greedily, searches the optimal histogram for every possible node split. Our experiments demonstrate that, in comparison to existing interpretable CDE methods, CDTrees are both more accurate (as measured by the log-loss) and more robust against irrelevant features. Further, our approach leads to smaller tree sizes than existing tree-based models, which benefits interpretability.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.11449

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Add feedback

Generating Global and Local Explanations for Tree-Ensemble Learning Methods by Answer Set Programming

Takemura, Akihiro, Inoue, Katsumi

arXiv.org Artificial IntelligenceOct-14-2024

We propose a method for generating rule sets as global and local explanations for tree-ensemble learning methods using Answer Set Programming (ASP). To this end, we adopt a decompositional approach where the split structures of the base decision trees are exploited in the construction of rules, which in turn are assessed using pattern mining methods encoded in ASP to extract explanatory rules. For global explanations, candidate rules are chosen from the entire trained tree-ensemble models, whereas for local explanations, candidate rules are selected by only considering rules that are relevant to the particular predicted instance. We show how user-defined constraints and preferences can be represented declaratively in ASP to allow for transparent and flexible rule set generation, and how rules can be used as explanations to help the user better understand the models. Experimental evaluation with real-world datasets and popular tree-ensemble algorithms demonstrates that our approach is applicable to a wide range of classification tasks.

explanation, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S1471068424000401

2410.11

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
(5 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(5 more...)

Add feedback