AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Empowering Decision Trees via Shape Function Branching

Upadhya, Nakul, Cohen, Eldan

arXiv.org Artificial IntelligenceOct-23-2025

Decision trees are prized for their interpretability and strong performance on tabular data. Yet, their reliance on simple axis-aligned linear splits often forces deep, complex structures to capture non-linear feature effects, undermining human comprehension of the constructed tree. To address this limitation, we propose a novel generalization of a decision tree, the Shape Generalized Tree (SGT), in which each internal node applies a learnable axis-aligned shape function to a single feature, enabling rich, non-linear partitioning in one split. As users can easily visualize each node's shape function, SGTs are inherently interpretable and provide intuitive, visual explanations of the model's decision mechanisms. To learn SGTs from data, we propose ShapeCART, an efficient induction algorithm for SGTs. We further extend the SGT framework to bivariate shape functions (S$^2$GT) and multi-way trees (SGT$_K$), and present Shape$^2$CART and ShapeCART$_K$, extensions to ShapeCART for learning S$^2$GTs and SGT$_K$s, respectively. Experiments on various datasets show that SGTs achieve superior performance with reduced model size compared to traditional axis-aligned linear trees.

artificial intelligence, machine learning, sgt 3, (16 more...)

arXiv.org Artificial Intelligence

2510.1904

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (0.45)
Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback

ACT: Agentic Classification Tree

Grari, Vincent, Arni, Tim, Laugel, Thibault, Lamprier, Sylvain, Zou, James, Detyniecki, Marcin

arXiv.org Artificial IntelligenceOct-23-2025

When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable, and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.26433

Country: Europe (1.00)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback

Leveraging Association Rules for Better Predictions and Better Explanations

Audemard, Gilles, Coste-Marquis, Sylvie, Marquis, Pierre, Sabiri, Mehdi, Szczepanski, Nicolas

arXiv.org Artificial IntelligenceOct-22-2025

We present a new approach to classification that combines data and knowledge. In this approach, data mining is used to derive association rules (possibly with negations) from data. Those rules are leveraged to increase the predictive performance of tree-based models (decision trees and random forests) used for a classification task. They are also used to improve the corresponding explanation task through the generation of abductive explanations that are more general than those derivable without taking such rules into account. Experiments show that for the two tree-based models under consideration, benefits can be offered by the approach in terms of predictive performance and in terms of explanation sizes.

artificial intelligence, explanation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.18628

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)

Add feedback

A Rectification-Based Approach for Distilling Boosted Trees into Decision Trees

Audemard, Gilles, Coste-Marquis, Sylvie, Marquis, Pierre, Sabiri, Mehdi, Szczepanski, Nicolas

arXiv.org Artificial IntelligenceOct-22-2025

We present a new approach for distilling boosted trees into decision trees, in the objective of generating an ML model offering an acceptable compromise in terms of predictive performance and interpretability. We explain how the correction approach called rectification can be used to implement such a distillation process. We show empirically that this approach provides interesting results, in comparison with an approach to distillation achieved by retraining the model.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.18615

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Interval Prediction of Annual Average Daily Traffic on Local Roads via Quantile Random Forest with High-Dimensional Spatial Data

Yao, Ying, Graham, Daniel J.

arXiv.org Machine LearningOct-22-2025

Accurate annual average daily traffic (AADT) data are vital for transport planning and infrastructure management. However, automatic traffic detectors across national road networks often provide incomplete coverage, leading to underrepresentation of minor roads. While recent machine learning advances have improved AADT estimation at unmeasured locations, most models produce only point predictions and overlook estimation uncertainty. This study addresses that gap by introducing an interval prediction approach that explicitly quantifies predictive uncertainty. We integrate a Quantile Random Forest model with Principal Component Analysis to generate AADT prediction intervals, providing plausible traffic ranges bounded by estimated minima and maxima. Using data from over 2,000 minor roads in England and Wales, and evaluated with specialized interval metrics, the proposed method achieves an interval coverage probability of 88.22%, a normalized average width of 0.23, and a Winkler Score of 7,468.47. By combining machine learning with spatial and high-dimensional analysis, this framework enhances both the accuracy and interpretability of AADT estimation, supporting more robust and informed transport planning.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2510.18548

Country:

North America > United States (1.00)
Asia (1.00)
Europe > United Kingdom > England (0.48)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Prediction-Augmented Trees for Reliable Statistical Inference

Kher, Vikram, Oikonomou, Argyris, Zampetakis, Manolis

arXiv.org Machine LearningOct-21-2025

The remarkable success of machine learning (ML) in predictive tasks has led scientists to incorporate ML predictions as a core component of the scientific discovery pipeline. This was exemplified by the landmark achievement of AlphaFold (Jumper et al. (2021)). In this paper, we study how ML predictions can be safely used in statistical analysis of data towards scientific discovery. In particular, we follow the framework introduced by Angelopoulos et al. (2023). In this framework, we assume access to a small set of $n$ gold-standard labeled samples, a much larger set of $N$ unlabeled samples, and a ML model that can be used to impute the labels of the unlabeled data points. We introduce two new learning-augmented estimators: (1) Prediction-Augmented Residual Tree (PART), and (2) Prediction-Augmented Quadrature (PAQ). Both estimators have significant advantages over existing estimators like PPI and PPI++ introduced by Angelopoulos et al. (2023) and Angelopoulos et al. (2024), respectively. PART is a decision-tree based estimator built using a greedy criterion. We first characterize PART's asymptotic distribution and demonstrate how to construct valid confidence intervals. Then we show that PART outperforms existing methods in real-world datasets from ecology, astronomy, and census reports, among other domains. This leads to estimators with higher confidence, which is the result of using both the gold-standard samples and the machine learning predictions. Finally, we provide a formal proof of the advantage of PART by exploring PAQ, an estimation that arises when considering the limit of PART when the depth its tree grows to infinity. Under appropriate assumptions in the input data we show that the variance of PAQ shrinks at rate of $O(N^{-1} + n^{-4})$, improving significantly on the $O(N^{-1}+n^{-1})$ rate of existing methods.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

2510.16937

Country:

North America > United States > California (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Programmatic Representation Learning with Language Models

Poesia, Gabriel, Sampaio, Georgia Gabriela

arXiv.org Artificial IntelligenceOct-17-2025

Classical models for supervised machine learning, such as decision trees, are efficient and interpretable predictors, but their quality is highly dependent on the particular choice of input features. Although neural networks can learn useful representations directly from raw data (e.g., images or text), this comes at the expense of interpretability and the need for specialized hardware to run them efficiently. In this paper, we explore a hypothesis class we call Learned Programmatic Representations (LeaPR) models, which stack arbitrary features represented as code (functions from data points to scalars) and decision tree predictors. We synthesize feature functions using Large Language Models (LLMs), which have rich prior knowledge in a wide range of domains and a remarkable ability to write code using existing domain-specific libraries. We propose two algorithms to learn LeaPR models from supervised data. First, we design an adaptation of FunSearch to learn features rather than directly generate predictors. Then, we develop a novel variant of the classical ID3 algorithm for decision tree learning, where new features are generated on demand when splitting leaf nodes. In experiments from chess position evaluation to image and text classification, our methods learn high-quality, neural network-free predictors often competitive with neural networks. Our work suggests a flexible paradigm for learning interpretable representations end-to-end where features and predictions can be readily inspected and understood.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.14825

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Efficient & Correct Predictive Equivalence for Decision Trees

Marques-Silva, Joao, Ignatiev, Alexey

arXiv.org Artificial IntelligenceOct-17-2025

The Rashomon set of decision trees (DTs) finds importance uses. Recent work showed that DTs computing the same classification function, i.e. predictive equivalent DTs, can represent a significant fraction of the Rashomon set. Such redundancy is undesirable. For example, feature importance based on the Rashomon set becomes inaccurate due the existence of predictive equivalent DTs, i.e. DTs with the same prediction for every possible input. In recent work, McTavish et al. proposed solutions for several computational problems related with DTs, including that of deciding predictive equivalent DTs. The approach of McTavish et al. consists of applying the well-known method of Quine-McCluskey (QM) for obtaining minimum-size DNF (disjunctive normal form) representations of DTs, which are then used for comparing DTs for predictive equivalence. Furthermore, the minimum-size DNF representation was also applied to computing explanations for the predictions made by DTs, and to finding predictions in the presence of missing data. However, the problem of formula minimization is hard for the second level of the polynomial hierarchy, and the QM method may exhibit worst-case exponential running time and space. This paper first demonstrates that there exist decision trees that trigger the worst-case exponential running time and space of the QM method. Second, the paper shows that the QM method may incorrectly decide predictive equivalence, if two key constraints are not respected, and one may be difficult to formally guarantee. Third, the paper shows that any of the problems to which the smallest DNF representation has been applied to can be solved in polynomial time, in the size of the DT. The experiments confirm that, for DTs for which the worst-case of the QM method is triggered, the algorithms proposed in this paper are orders of magnitude faster than the ones proposed by McTavish et al.

artificial intelligence, machine learning, predictive equivalence, (17 more...)

arXiv.org Artificial Intelligence

2509.17774

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Leveraging Predictive Equivalence in Decision Trees

McTavish, Hayden, Boner, Zachery, Donnelly, Jon, Seltzer, Margo, Rudin, Cynthia

arXiv.org Artificial IntelligenceOct-15-2025

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence's impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.14143

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Improving Decision Trees through the Lens of Parameterized Local Search

Harviainen, Juha, Sommer, Frank, Sorge, Manuel

arXiv.org Artificial IntelligenceOct-15-2025

Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by performing a fixed number of a single type of these operations. Although we discover that the corresponding problems are NP-complete in general, we provide a comprehensive parameterized-complexity analysis with the aim of determining those properties of the problems that explain the hardness and those that make the problems tractable. For instance, we show that the problems remain hard for a small number $d$ of features or small domain size $D$ but the combination of both yields fixed-parameter tractability. That is, the problems are solvable in $(D + 1)^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the size of the input. We also provide a proof-of-concept implementation of this algorithm and report on empirical results.

artificial intelligence, machine learning, threshold, (18 more...)

arXiv.org Artificial Intelligence

2510.12726

Country: Europe > Austria (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback