AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Identifying Key Features for Establishing Sustainable Agro-Tourism Centre: A Data Driven Approach

Gadakh, Alka, Kumbhar, Vidya, Khosla, Sonal, Karunendra, Kumar

arXiv.org Artificial IntelligenceSep-12-2025

Agro-tourism serves as a strategic economic model designed to facilitate rural development by diversifying income streams for local communities like farmers while promoting the conservation of indigenous cultural heritage and traditional agricultural practices. As a very booming subdomain of tourism, there is a need to study the strategies for the growth of Agro-tourism in detail. The current study has identified the important indicators for the growth and enhancement of agro-tourism. The study is conducted in two phases: identification of the important indicators through a comprehensive literature review and in the second phase state-of-the-art techniques were used to identify the important indicators for the growth of agro-tourism. The indicators are also called features synonymously, the machine learning models for feature selection were applied and it was observed that the Least Absolute Shrinkage and Selection Operator (LASSO) method combined with, the machine Learning Classifiers such as Logistic Regression (LR), Decision Trees (DT), Random Forest (RF) Tree, and Extreme Gradient Boosting (XGBOOST) models were used to suggest the growth of the agro-tourism. The results show that with the LASSO method, LR model gives the highest classification accuracy of 98% in 70-30% train-test data followed by RF with 95% accuracy. Similarly, in the 80-20% train-test data LR maintains the highest accuracy at 99%, while DT and XGBoost follow with 97% accuracy.

artificial intelligence, machine learning, tourism, (16 more...)

arXiv.org Artificial Intelligence

2509.09214

Country: Asia > Indonesia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Random Forest Stratified K-Fold Cross Validation on SYN DoS Attack SD-IoV

Zamrai, Muhammad Arif Hakimi, Yusof, Kamaludin Mohd

arXiv.org Artificial IntelligenceSep-10-2025

In response to the prevalent concern of TCP SYN flood attacks within the context of Software-Defined Internet of Vehicles (SD-IoV), this study addresses the significant challenge of network security in rapidly evolving vehicular communication systems. This research focuses on optimizing a Random Forest Classifier model to achieve maximum accuracy and minimal detection time, thereby enhancing vehicular network security. The methodology involves preprocessing a dataset containing SYN attack instances, employing feature scaling and label encoding techniques, and applying Stratified K-Fold cross-validation to target key metrics such as accuracy, precision, recall, and F1-score. This research achieved an average value of 0.999998 for all metrics with a SYN DoS attack detection time of 0.24 seconds. Results show that the fine-tuned Random Forest model, configured with 20 estimators and a depth of 10, effectively differentiates between normal and malicious traffic with high accuracy and minimal detection time, which is crucial for SD-IoV networks. This approach marks a significant advancement and introduces a state-of-the-art algorithm in detecting SYN flood attacks, combining high accuracy with minimal detection time. It contributes to vehicular network security by providing a robust solution against TCP SYN flood attacks while maintaining network efficiency and reliability.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICCET62255.2024.00008 10.1109/ICCET62255.2024.00008 10.1109/ICCET62255.2024.00008

2509.07016

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

Graph Transformer-Based Flood Susceptibility Mapping: Application to the French Riviera and Railway Infrastructure Under Climate Change

Vemula, Sreenath, Gatti, Filippo, Jehel, Pierre

arXiv.org Artificial IntelligenceSep-8-2025

Increasing flood frequency and severity due to climate change threatens infrastructure and demands improved susceptibility mapping techniques. While traditional machine learning (ML) approaches are widely used, they struggle to capture spatial dependencies and poor boundary delineation between susceptibility classes. This study introduces the first application of a graph transformer (GT) architecture for flood susceptibility mapping to the flood-prone French Riviera (e.g., 2020 Storm Alex) using topography, hydrology, geography, and environmental data. GT incorporates watershed topology using Laplacian positional encoders (PEs) and attention mechanisms. The developed GT model has an AUC-ROC (0.9739), slightly lower than XGBoost (0.9853). However, the GT model demonstrated better clustering and delineation with a higher Moran's I value (0.6119) compared to the random forest (0.5775) and XGBoost (0.5311) with p-value lower than 0.0001. Feature importance revealed a striking consistency across models, with elevation, slope, distance to channel, and convergence index being the critical factors. Dimensionality reduction on Laplacian PEs revealed partial clusters, indicating they could capture spatial information; however, their importance was lower than flood factors. Since climate and land use changes aggravate flood risk, susceptibility maps are developed for the 2050 year under different Representative Concentration Pathways (RCPs) and railway track vulnerability is assessed. All RCP scenarios revealed increased area across susceptibility classes, except for the very low category. RCP 8.5 projections indicate that 17.46% of the watershed area and 54% of railway length fall within very-high susceptible zones, compared to 6.19% and 35.61%, respectively, under current conditions. The developed maps can be integrated into a multi-hazard framework.

artificial intelligence, machine learning, susceptibility, (19 more...)

arXiv.org Artificial Intelligence

2504.03727

Country:

Asia (0.92)
Europe > France > Provence-Alpes-Côte d'Azur (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Predicting Antimicrobial Resistance (AMR) in Campylobacter, a Foodborne Pathogen, and Cost Burden Analysis Using Machine Learning

Mishra, Shubham, Han, The Anh, Lopes, Bruno Silvester, Ghareeb, Shatha, Shamszaman, Zia Ush

arXiv.org Artificial IntelligenceSep-5-2025

Antimicrobial resistance (AMR) poses a significant public health and economic challenge, increasing treatment costs and reducing antibiotic effectiveness. This study employs machine learning to analyze genomic and epidemiological data from the public databases for molecular typing and microbial genome diversity (PubMLST), incorporating data from UK government-supported AMR surveillance by the Food Standards Agency and Food Standards Scotland. We identify AMR patterns in Campylobacter jejuni and Campylobacter coli isolates collected in the UK from 2001 to 2017. The research integrates whole-genome sequencing (WGS) data, epidemiological metadata, and economic projections to identify key resistance determinants and forecast future resistance trends and healthcare costs. We investigate gyrA mutations for fluoroquinolone resistance and the tet(O) gene for tetracycline resistance, training a Random Forest model validated with bootstrap resampling (1,000 samples, 95% confidence intervals), achieving 74% accuracy in predicting AMR phenotypes. Time-series forecasting models (SARIMA, SIR, and Prophet) predict a rise in campylobacteriosis cases, potentially exceeding 130 cases per 100,000 people by 2050, with an economic burden projected to surpass 1.9 billion GBP annually if left unchecked. An enhanced Random Forest system, analyzing 6,683 isolates, refines predictions by incorporating temporal patterns, uncertainty estimation, and resistance trend modeling, indicating sustained high beta-lactam resistance, increasing fluoroquinolone resistance, and fluctuating tetracycline resistance.

artificial intelligence, machine learning, resistance, (19 more...)

arXiv.org Artificial Intelligence

2509.03551

Country: Europe > United Kingdom > England (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Industry:

Water & Waste Management > Water Management > Constituents > Bacteria (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bayesian Additive Regression Trees for functional ANOVA model

Park, Seokhun, Kong, Insung, Kim, Yongdai

arXiv.org Machine LearningSep-4-2025

Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves superior predictive performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART surpasses BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

anov a-bart, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2509.03317

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Exploring the Design Space of Fair Tree Learning Algorithms

Stempel, Kiara, Cerrato, Mattia, Kramer, Stefan

arXiv.org Artificial IntelligenceSep-4-2025

Decision trees have been studied extensively in the context of fairness, aiming to maximize prediction performance while ensuring non-discrimination against different groups. Techniques in this space usually focus on imposing constraints at training time, constraining the search space so that solutions which display unacceptable values of relevant metrics are not considered, discarded, or discouraged. If we assume one target variable y and one sensitive attribute s, the design space of tree learning algorithms can be spanned as follows: (i) One can have one tree T that is built using an objective function that is a function of y, s, and T. For instance, one can build a tree based on the weighted information gain regarding y (maximizing) and s (minimizing). (ii) The second option is to have one tree model T that uses an objective function in y and T and a constraint on s and T. Here, s is no longer part of the objective, but part of a constraint. This can be achieved greedily by aborting a further split as soon as the condition that optimizes the objective in y fails to satisfy the constraint on s. A simple way to explore other splits is to backtrack during tree construction once a fairness constraint is violated. (iii) The third option is to have two trees T_y and T_s, one for y and one for s, such that the tree structure for y and s does not have to be shared. In this way, information regarding y and regarding s can be used independently, without having to constrain the choices in tree construction by the mutual information between the two variables. Quite surprisingly, of the three options, only the first one and the greedy variant of the second have been studied in the literature so far. In this paper, we introduce the above two additional options from that design space and characterize them experimentally on multiple datasets.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.03204

Country: Europe > Germany (0.14)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Probit Monotone BART

Fisher, Jared D.

arXiv.org Machine LearningSep-3-2025

Bayesian Additive Regression Trees (BART) of Chipman et al. (2010) has proven to be a powerful tool for nonparametric modeling and prediction. Monotone BART (Chipman et al., 2022) is a recent development that allows BART to be more precise in estimating monotonic functions. We further these developments by proposing probit monotone BART, which allows the monotone BART framework to estimate conditional mean functions when the outcome variable is binary.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2509.00263

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Predicting NCAP Safety Ratings: An Analysis of Vehicle Characteristics and ADAS Features Using Machine Learning

Kunwar, Raunak, LeBoulluec, Aera Kim

arXiv.org Artificial IntelligenceSep-3-2025

Vehicle safety assessment is crucial for consumer information and regulatory oversight. The New Car Assessment Program (NCAP) assigns standardized safety ratings, which traditionally emphasize passive safety measures but now include active safety technologies such as Advanced Driver-Assistance Systems (ADAS). It is crucial to understand how these various systems interact empirically. This study explores whether particular ADAS features like Forward Collision Warning, Lane Departure Warning, Crash Imminent Braking, and Blind Spot Detection, together with established vehicle attributes (e.g., Curb Weight, Model Year, Vehicle Type, Drive Train), can reliably predict a vehicle's likelihood of earning the highest (5-star) overall NCAP rating. Using a publicly available dataset derived from NCAP reports that contain approximately 5,128 vehicle variants spanning model years 2011-2025, we compared four different machine learning models: logistic regression, random forest, gradient boosting, and support vector classifier (SVC) using a 5-fold stratified cross-validation approach. The two best-performing algorithms (random forest and gradient boost) were hyperparameter optimized using RandomizedSearchCV. Analysis of feature importance showed that basic vehicle characteristics, specifically curb weight and model year, dominated predictive capability, contributing more than 55% of the feature relevance of the Random Forest model. However, the inclusion of ADAS features also provided meaningful predictive contributions. The optimized Random Forest model achieved robust results on a held-out test set, with an accuracy of 89.18% and a ROC AUC of 0.9586. This research reveals the use of machine learning to analyze large-scale NCAP data and highlights the combined predictive importance of both established vehicle parameters and modern ADAS features to achieve top safety ratings.

artificial intelligence, machine learning, random forest, (13 more...)

arXiv.org Artificial Intelligence

2509.01897

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

A Human-In-The-Loop Approach for Improving Fairness in Predictive Business Process Monitoring

Käppel, Martin, Neuberger, Julian, Möhrlein, Felix, Weinzierl, Sven, Matzner, Martin, Jablonski, Stefan

arXiv.org Artificial IntelligenceAug-26-2025

Predictive process monitoring enables organizations to proactively react and intervene in running instances of a business process. Given an incomplete process instance, predictions about the outcome, next activity, or remaining time are created. This is done by powerful machine learning models, which have shown impressive predictive performance. However, the data-driven nature of these models makes them susceptible to finding unfair, biased, or unethical patterns in the data. Such patterns lead to biased predictions based on so-called sensitive attributes, such as the gender or age of process participants. Previous work has identified this problem and offered solutions that mitigate biases by removing sensitive attributes entirely from the process instance. However, sensitive attributes can be used both fairly and unfairly in the same process instance. For example, during a medical process, treatment decisions could be based on gender, while the decision to accept a patient should not be based on gender. This paper proposes a novel, model-agnostic approach for identifying and rectifying biased decisions in predictive business process monitoring models, even when the same sensitive attribute is used both fairly and unfairly. The proposed approach uses a human-in-the-loop approach to differentiate between fair and unfair decisions through simple alterations on a decision tree model distilled from the original prediction model. Our results show that the proposed approach achieves a promising tradeoff between fairness and accuracy in the presence of biased data. All source code and data are publicly available at https://doi.org/10.5281/zenodo.15387576.

artificial intelligence, event log, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.17477

Country: Europe > Germany > Bavaria (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)

Add feedback

Learning ON Large Datasets Using Bit-String Trees

Gupta, Prashant

arXiv.org Artificial IntelligenceAug-26-2025

This thesis develops computational methods in similarity-preserving hashing, classification, and cancer genomics. Standard space partitioning-based hashing relies on Binary Search Trees (BSTs), but their exponential growth and sparsity hinder efficiency. To overcome this, we introduce Compressed BST of Inverted hash tables (ComBI), which enables fast approximate nearest-neighbor search with reduced memory. On datasets of up to one billion samples, ComBI achieves 0.90 precision with 4X-296X speed-ups over Multi-Index Hashing, and also outperforms Cellfishing.jl on single-cell RNA-seq searches with 2X-13X gains. Building on hashing structures, we propose Guided Random Forest (GRAF), a tree-based ensemble classifier that integrates global and local partitioning, bridging decision trees and boosting while reducing generalization error. Across 115 datasets, GRAF delivers competitive or superior accuracy, and its unsupervised variant (uGRAF) supports guided hashing and importance sampling. We show that GRAF and ComBI can be used to estimate per-sample classifiability, which enables scalable prediction of cancer patient survival. To address challenges in interpreting mutations, we introduce Continuous Representation of Codon Switches (CRCS), a deep learning framework that embeds genetic changes into numerical vectors. CRCS allows identification of somatic mutations without matched normals, discovery of driver genes, and scoring of tumor mutations, with survival prediction validated in bladder, liver, and brain cancers. Together, these methods provide efficient, scalable, and interpretable tools for large-scale data analysis and biomedical applications.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2508.17083

Country:

Europe (0.67)
Asia > India (0.45)
North America > United States (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.45)
Health & Medicine > Therapeutic Area > Oncology > Brain Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(5 more...)

Add feedback