AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Asymptotic confidence bands for centered purely random forests

Neumeyer, Natalie, Rabe, Jan, Trabs, Mathias

arXiv.org Machine LearningNov-18-2025

In a multivariate nonparametric regression setting we construct explicit asymptotic uniform confidence bands for centered purely random forests. Since the most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates, we propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. Our theoretical findings are illustrated in simulation examples.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2511.13199

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Robust Watermarking on Gradient Boosting Decision Trees

Chung, Jun Woo, Lao, Yingjie, Zhao, Weijie

arXiv.org Artificial IntelligenceNov-14-2025

Gradient Boosting Decision Trees (GBDTs) are widely used in industry and academia for their high accuracy and efficiency, particularly on structured data. However, watermarking GBDT models remains underexplored compared to neural networks. In this work, we present the first robust watermarking framework tailored to GBDT models, utilizing in-place fine-tuning to embed imperceptible and resilient watermarks. We propose four embedding strategies, each designed to minimize impact on model accuracy while ensuring watermark robustness. Through experiments across diverse datasets, we demonstrate that our methods achieve high watermark embedding rates, low accuracy degradation, and strong resistance to post-deployment fine-tuning.

artificial intelligence, machine learning, watermark, (17 more...)

arXiv.org Artificial Intelligence

2511.09822

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Shang, Jing, Bannon, James, Haibe-Kains, Benjamin, Tibshirani, Robert

arXiv.org Machine LearningNov-13-2025

Tree-based methods are a family of non-parametric approaches in supervised learning. Random forests use a form of bootstrap aggregation, or bagging, to combine a large collection of trees and produce a final prediction. In regression problems, it gives the same weight to each tree and computes the average out-of-bag prediction. In classification problems, it assigns class labels by majority vote. However, since a single-tree model is known to have high variance, a large number of trees need to be trained and aggregated in order to reduce variance (Hastie et al. 2009). This can lead to redundant trees, as the bootstrap procedure may select similar sets of samples to train different trees. Moreover, increasing the number of trees does not reduce the bias. Post-selection boosting random forests, proposed by Wang & Wang (2021), is an attempt to reduce bias by applying Lasso regression (Tibshirani 1996) on the predictions from each individual tree. The method returns a sparser forest with fewer trees, as well as different weights assigned to each individual tree.

artificial intelligence, machine learning, post-selection forest, (17 more...)

arXiv.org Machine Learning

2511.06698

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

Nadel, Alexander, Wettenstein, Ron

arXiv.org Artificial IntelligenceNov-13-2025

SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.

artificial intelligence, formula, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.09376

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.93)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Pattern Recognition of Scrap Plastic Misclassification in Global Trade Data

Ramli, Muhammad Sukri Bin

arXiv.org Artificial IntelligenceNov-13-2025

We propose an interpretable machine learning framework to help identify trade data discrepancies that are challenging to detect with traditional methods. Our system analyzes trade data to find a novel inverse price-volume signature, a pattern where reported volumes increase as average unit prices decrease. The model achieves 0.9375 accuracy and was validated by comparing large-scale UN data with detailed firm-level data, confirming that the risk signatures are consistent. This scalable tool provides customs authorities with a transparent, data-driven method to shift from conventional to priority-based inspection protocols, translating complex data into actionable intelligence to support international environmental policies.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.08638

Country:

Europe (0.70)
Asia > Malaysia (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (1.00)
Banking & Finance > Trading (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)

Add feedback

Binary Split Categorical feature with Mean Absolute Error Criteria in CART

Yu, Peng, Chen, Yike, Xu, Chao, Bifet, Albert, Read, Jesse

arXiv.org Artificial IntelligenceNov-12-2025

In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established. However, using the Mean Absolute Error (MAE) criterion for categorical features has traditionally relied on various numerical encoding methods. This paper demonstrates that unsupervised numerical encoding methods are not viable for the MAE criteria. Furthermore, we present a novel and efficient splitting algorithm that addresses the challenges of handling categorical features with the MAE criterion. Our findings underscore the limitations of existing approaches and offer a promising solution to enhance the handling of categorical data in CART algorithms.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.0847

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Feature Importance Guided Random Forest Learning with Simulated Annealing Based Hyperparameter Tuning

Balasubramanian, Kowshik, Williams, Andre, Butun, Ismail

arXiv.org Artificial IntelligenceNov-12-2025

Abstract--This paper introduces a novel framework for enhancing Random Forest classifiers by integrating probabilistic feature sampling and hyperparameter tuning via Simulated Annealing. The proposed framework exhibits substantial advancements in predictive accuracy and generalization, adeptly tackling the multifaceted challenges of robust classification across diverse domains, including credit risk evaluation, anomaly detection in IoT ecosystems, early-stage medical diagnostics, and high-dimensional biological data analysis. T o overcome the limitations of conventional Random Forests, we present an approach that places stronger emphasis on capturing the most relevant signals from data while enabling adaptive hyperparameter configuration. The model is guided towards features that contribute more meaningfully to classification and optimizing this with dynamic parameter tuning. The results demonstrate consistent accuracy improvements and meaningful insights into feature relevance, showcasing the efficacy of combining importance aware sampling and metaheuristic optimization. RFs are widely used ensemble learning methods known for their robustness, interpretability, scalability and performance across diverse machine learning tasks.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.00133

Country: North America > United States > Florida (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Banking & Finance > Credit (0.49)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
(2 more...)

Add feedback

Automated and Explainable Denial of Service Analysis for AI-Driven Intrusion Detection Systems

Yakubu, Paul Badu, Santana, Lesther, Rahouti, Mohamed, Xin, Yufeng, Chehri, Abdellah, Aledhari, Mohammed

arXiv.org Artificial IntelligenceNov-7-2025

With the increasing frequency and sophistication of Distributed Denial of Service (DDoS) attacks, it has become critical to develop more efficient and interpretable detection methods. Traditional detection systems often struggle with scalability and transparency, hindering real-time response and understanding of attack vectors. This paper presents an automated framework for detecting and interpreting DDoS attacks using machine learning (ML). The proposed method leverages the Tree-based Pipeline Optimization Tool (TPOT) to automate the selection and optimization of ML models and features, reducing the need for manual experimentation. SHapley Additive exPlanations (SHAP) is incorporated to enhance model interpretability, providing detailed insights into the contribution of individual features to the detection process. By combining TPOT's automated pipeline selection with SHAP interpretability, this approach improves the accuracy and transparency of DDoS detection. Experimental results demonstrate that key features such as mean backward packet length and minimum forward packet header length are critical in detecting DDoS attacks, offering a scalable and explainable cybersecurity solution.

data mining, detection, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.04114

Country:

North America > United States (0.46)
North America > Canada (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(5 more...)

Add feedback

Towards Scalable Meta-Learning of near-optimal Interpretable Models via Synthetic Model Generations

Myint, Kyaw Hpone, Wu, Zhe, Day, Alexandre G. R., Iyengar, Giri

arXiv.org Machine LearningNov-7-2025

Decision trees are widely used in high-stakes fields like finance and healthcare due to their interpretability. This work introduces an efficient, scalable method for generating synthetic pre-training data to enable meta-learning of decision trees. Our approach samples near-optimal decision trees synthetically, creating large-scale, realistic datasets. Using the MetaTree transformer architecture, we demonstrate that this method achieves performance comparable to pre-training on real-world data or with computationally expensive optimal decision trees. This strategy significantly reduces computational costs, enhances data generation flexibility, and paves the way for scalable and efficient meta-learning of interpretable decision tree models.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Machine Learning

2511.04

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration

Arslan, Elif, van der Linden, Jacobus G. M., Hoogendoorn, Serge, Rinaldi, Marco, Demirović, Emir

arXiv.org Artificial IntelligenceNov-6-2025

Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.03344

Country:

Europe (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback