AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

ForeCal: Random Forest-based Calibration for DNNs

Nigam, Dhruv

arXiv.org Artificial IntelligenceSep-4-2024

Deep neural network(DNN) based classifiers do extremely well in discriminating between observations, resulting in higher ROC AUC and accuracy metrics, but their outputs are often miscalibrated with respect to true event likelihoods. Post-hoc calibration algorithms are often used to calibrate the outputs of these classifiers. Methods like Isotonic regression, Platt scaling, and Temperature scaling have been shown to be effective in some cases but are limited by their parametric assumptions and/or their inability to capture complex non-linear relationships. We propose ForeCal - a novel post-hoc calibration algorithm based on Random forests. ForeCal exploits two unique properties of Random forests: the ability to enforce weak monotonicity and range-preservation. It is more powerful in achieving calibration than current state-of-the-art methods, is non-parametric, and can incorporate exogenous information as features to learn a better calibration function. Through experiments on 43 diverse datasets from the UCI ML repository, we show that ForeCal outperforms existing methods in terms of Expected Calibration Error(ECE) with minimal impact on the discriminative power of the base DNN as measured by AUC.

calibration, calibration function, probability, (14 more...)

arXiv.org Artificial Intelligence

2409.02446

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)

Add feedback

Minimising changes to audit when updating decision trees

Simmons, Anj, Barnett, Scott, Chaudhuri, Anupam, Singh, Sankhya, Sivasothy, Shangeetha

arXiv.org Artificial IntelligenceAug-29-2024

Interpretable models are important, but what happens when the model is updated on new training data? We propose an algorithm for updating a decision tree while minimising the number of changes to the tree that a human would need to audit. We achieve this via a greedy approach that incorporates the number of changes to the tree as part of the objective function. We compare our algorithm to existing methods and show that it sits in a sweet spot between final accuracy and number of changes to audit.

algorithm, decision tree, node, (16 more...)

arXiv.org Artificial Intelligence

2408.16321

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Subgroup Analysis via Model-based Rule Forest

Cheng, I-Ling, Hsu, Chan, Ku, Chantung, Lee, Pei-Ju, Kang, Yihuang

arXiv.org Artificial IntelligenceAug-27-2024

Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from data. By leveraging IF-THEN rules with multi-level logic expressions, mobDRF enhances the interpretability of existing models without compromising accuracy. We apply mobDRF to identify key risk factors for cognitive decline in an elderly population, demonstrating its effectiveness in subgroup analysis and local model optimization. Our method offers a promising solution for developing trustworthy and interpretable machine learning models, particularly valuable in fields like healthcare, where understanding differential effects across patient subgroups can lead to more personalized and effective treatments.

interpretability, mobdrf, representation, (15 more...)

arXiv.org Artificial Intelligence

2408.15057

Country:

Europe > Austria > Vienna (0.14)
Asia > Taiwan > Takao Province > Kaohsiung (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.35)
Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Machine Learning for Quantifier Selection in cvc5

Jakubův, Jan, Janota, Mikoláš, Piepenbrock, Jelle, Urban, Josef

arXiv.org Artificial IntelligenceAug-26-2024

In this work we considerably improve the state-of-the-art SMT solving on first-order quantified problems by efficient machine learning guidance of quantifier selection. Quantifiers represent a significant challenge for SMT and are technically a source of undecidability. In our approach, we train an efficient machine learning model that informs the solver which quantifiers should be instantiated and which not. Each quantifier may be instantiated multiple times and the set of the active quantifiers changes as the solving progresses. Therefore, we invoke the ML predictor many times, during the whole run of the solver. To make this efficient, we use fast ML models based on gradient boosting decision trees. We integrate our approach into the state-of-the-art cvc5 SMT solver and show a considerable increase of the system's holdout-set performance after training it on a large set of first-order problems collected from the Mizar Mathematical Library.

formula, instantiation, portfolio, (16 more...)

arXiv.org Artificial Intelligence

2408.14338

Country:

Europe > Czechia > Prague (0.04)
Europe > Austria > Tyrol > Innsbruck (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.48)

Add feedback

Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations

Maturo, Fabrizio, Porreca, Annamaria

arXiv.org Machine LearningAug-23-2024

This paper introduces a novel supervised classification strategy that integrates functional data analysis (FDA) with tree-based methods, addressing the challenges of high-dimensional data and enhancing the classification performance of existing functional classifiers. Specifically, we propose augmented versions of functional classification trees and functional random forests, incorporating a new tool for assessing the importance of functional principal components. This tool provides an ad-hoc method for determining unbiased permutation feature importance in functional data, particularly when dealing with correlated features derived from successive derivatives. Our study demonstrates that these additional features can significantly enhance the predictive power of functional classifiers. Experimental evaluations on both real-world and simulated datasets showcase the effectiveness of the proposed methodology, yielding promising results compared to existing methods.

afct, derivative, functional data, (14 more...)

arXiv.org Machine Learning

2408.13179

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

Add feedback

RIFF: Inducing Rules for Fraud Detection from Decision Trees

Martins, João Lucas, Bravo, João, Gomes, Ana Sofia, Soares, Carlos, Bizarro, Pedro

arXiv.org Artificial IntelligenceAug-23-2024

Financial fraud is the cause of multi-billion dollar losses annually. Traditionally, fraud detection systems rely on rules due to their transparency and interpretability, key features in domains where decisions need to be explained. However, rule systems require significant input from domain experts to create and tune, an issue that rule induction algorithms attempt to mitigate by inferring rules directly from data. We explore the application of these algorithms to fraud detection, where rule systems are constrained to have a low false positive rate (FPR) or alert rate, by proposing RIFF, a rule induction algorithm that distills a low FPR rule set directly from decision trees. Our experiments show that the induced rules are often able to maintain or improve performance of the original models for low FPR tasks, while substantially reducing their complexity and outperforming rules hand-tuned by experts.

algorithm, decision tree, riff, (14 more...)

arXiv.org Artificial Intelligence

2408.12989

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
Asia > Taiwan (0.06)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report (0.71)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Randomness control and reproducibility study of random forest algorithm in R and Python

Camadini, Louisa, Bouzid, Yanis, Merlet, Maeva, Carron, Léopold

arXiv.org Artificial IntelligenceAug-22-2024

When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is to investigate the parameters and sources of randomness affecting the outcomes ofRandom Forest algorithms.By setting comparable parameters and using the same Pseudo-Random Number Generator (PRNG),we expect to reproduce results consistently across the various available implementations of therandom forest algorithm. Nevertheless, this exploration will unveil hidden layers of randomness andguide our understanding of the critical parameters necessary to ensure reproducibility across all fourimplementations of the random forest algorithm.

python, random forest algorithm, randomness control and reproducibility study

arXiv.org Artificial Intelligence

2408.12184

Genre: Research Report (0.69)

Industry: Consumer Products & Services (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Demystifying Functional Random Forests: Novel Explainability Tools for Model Transparency in High-Dimensional Spaces

Maturo, Fabrizio, Porreca, Annamaria

arXiv.org Artificial IntelligenceAug-22-2024

The advent of big data has raised significant challenges in analysing high-dimensional datasets across various domains such as medicine, ecology, and economics. Functional Data Analysis (FDA) has proven to be a robust framework for addressing these challenges, enabling the transformation of high-dimensional data into functional forms that capture intricate temporal and spatial patterns. However, despite advancements in functional classification methods and very high performance demonstrated by combining FDA and ensemble methods, a critical gap persists in the literature concerning the transparency and interpretability of black-box models, e.g. Functional Random Forests (FRF). In response to this need, this paper introduces a novel suite of explainability tools to illuminate the inner mechanisms of FRF. We propose using Functional Partial Dependence Plots (FPDPs), Functional Principal Component (FPC) Probability Heatmaps, various model-specific and model-agnostic FPCs' importance metrics, and the FPC Internal-External Importance and Explained Variance Bubble Plot. These tools collectively enhance the transparency of FRF models by providing a detailed analysis of how individual FPCs contribute to model predictions. By applying these methods to an ECG dataset, we demonstrate the effectiveness of these tools in revealing critical patterns and improving the explainability of FRF.

fpc, functional data, probability, (15 more...)

arXiv.org Artificial Intelligence

2408.12288

Country:

North America > United States > New York (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)

Add feedback

OPTDTALS: Approximate Logic Synthesis via Optimal Decision Trees Approach

Hu, Hao, Cai, Shaowei

arXiv.org Artificial IntelligenceAug-22-2024

The growing interest in Explainable Artificial Intelligence (XAI) motivates promising studies of computing optimal Interpretable Machine Learning models, especially decision trees. Such models generally provide optimality in compact size or empirical accuracy. Recent works focus on improving efficiency due to the natural scalability issue. The application of such models to practical problems is quite limited. As an emerging problem in circuit design, Approximate Logic Synthesis (ALS) aims to reduce circuit complexity by sacrificing correctness. Recently, multiple heuristic machine learning methods have been applied in ALS, which learns approximated circuits from samples of input-output pairs. In this paper, we propose a new ALS methodology realizing the approximation via learning optimal decision trees in empirical accuracy. Compared to previous heuristic ALS methods, the guarantee of optimality achieves a more controllable trade-off between circuit complexity and accuracy. Experimental results show clear improvements in our methodology in the quality of approximated designs (circuit complexity and accuracy) compared to the state-of-the-art approaches.

accuracy, approximated design, optdtal, (13 more...)

arXiv.org Artificial Intelligence

2408.12304

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
(23 more...)

Genre:

Research Report > New Finding (0.48)
Overview > Innovation (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Simplifying Random Forests' Probabilistic Forecasts

Koster, Nils, Krüger, Fabian

arXiv.org Machine LearningAug-22-2024

Since their introduction by Breiman, Random Forests (RFs) have proven to be useful for both classification and regression tasks. The RF prediction of a previously unseen observation can be represented as a weighted sum of all training sample observations. This nearest-neighbor-type representation is useful, among other things, for constructing forecast distributions (Meinshausen, 2006). In this paper, we consider simplifying RF-based forecast distributions by sparsifying them. That is, we focus on a small subset of nearest neighbors while setting the remaining weights to zero. This sparsification step greatly improves the interpretability of RF predictions. It can be applied to any forecasting task without re-training existing RF models. In empirical experiments, we document that the simplified predictions can be similar to or exceed the original ones in terms of forecasting performance. We explore the statistical sources of this finding via a stylized analytical model of RFs. The model suggests that simplification is particularly promising if the unknown true forecast distribution contains many small weights that are estimated imprecisely.

forecast, forecast distribution, hyperparameter, (16 more...)

arXiv.org Machine Learning

2408.12332

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback