AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Application of the representative measure approach to assess the reliability of decision trees in dealing with unseen vehicle collision data

Perera-Lago, Javier, Toscano-Durán, Víctor, Paluzo-Hidalgo, Eduardo, Narteni, Sara, Rucco, Matteo

arXiv.org Artificial IntelligenceApr-15-2024

Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model's complexity, power, and uncertainties. In this paper, we investigate the reliability of the $\varepsilon$-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by $\varepsilon$-representativeness, i.e., both of them have points closer than $\varepsilon$, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that $\varepsilon$-representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine-learning component widely adopted for dealing with tabular data.

dataset, decision tree, subset, (15 more...)

arXiv.org Artificial Intelligence

2404.09541

Country:

Europe > Spain > Andalusia > Seville Province > Seville (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Large Scale Survey of Motivation in Software Development and Analysis of its Validity

Amit, Idan, Feitelson, Dror G.

arXiv.org Artificial IntelligenceApr-12-2024

Context: Motivation is known to improve performance. In software development in particular, there has been considerable interest in the motivation of contributors to open source. Objective: We identify 11 motivators from the literature (enjoying programming, ownership of code, learning, self use, etc.), and evaluate their relative effect on motivation. Since motivation is an internal subjective feeling, we also analyze the validity of the answers. Method: We conducted a survey with 66 questions on motivation which was completed by 521 developers. Most of the questions used an 11 point scale. We evaluated the validity of the answers validity by comparing related questions, comparing to actual behavior on GitHub, and comparison with the same developer in a follow up survey. Results: Validity problems include moderate correlations between answers to related questions, as well as self promotion and mistakes in the answers. Despite these problems, predictive analysis, investigating how diverse motivators influence the probability of high motivation, provided valuable insights. The correlations between the different motivators are low, implying their independence. High values in all 11 motivators predict increased probability of high motivation. In addition, improvement analysis shows that an increase in most motivators predicts an increase in general motivation.

motivation, motivator, participant, (15 more...)

arXiv.org Artificial Intelligence

2404.08303

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation

Biebert, Daniel, Hakert, Christian, Chen, Kuan-Hsun, Chen, Jian-Jia

arXiv.org Artificial IntelligenceApr-10-2024

Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This leads to not optimally used CPU registers. This is a shortcoming, especially in resource constrained embedded setups. In this work, we present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step directly from the high-level model representation. Specifically, we develop various approaches to effectively allocate registers for the inference of decision tree ensembles. Extensive evaluations of the proposed method are conducted in comparison to the basic realization of C code from the high-level machine learning model and succeeding compilation. The results show that the performance of decision tree ensemble inference can be significantly improved (by up to $\approx1.6\times$), if the methods are applied carefully to the appropriate scenario.

if-else tree, implementation, node, (16 more...)

arXiv.org Artificial Intelligence

2404.06846

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Incremental XAI: Memorable Understanding of AI with Incremental Explanations

Bo, Jessica Y., Hao, Pan, Lim, Brian Y.

arXiv.org Artificial IntelligenceApr-10-2024

Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.

explanation, participant, subspace, (13 more...)

arXiv.org Artificial Intelligence

2404.06733

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.06)
Asia > Singapore > Central Region > Singapore (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.88)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
(3 more...)

Add feedback

Online Learning of Decision Trees with Thompson Sampling

Chaouki, Ayman, Read, Jesse, Bifet, Albert

arXiv.org Artificial IntelligenceApr-9-2024

Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.

algorithm, decision tree, online learning, (14 more...)

arXiv.org Artificial Intelligence

2404.06403

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.70)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

FLEX: FLEXible Federated Learning Framework

Herrera, Francisco, Jiménez-López, Daniel, Argente-Garrido, Alberto, Rodríguez-Barroso, Nuria, Zuheros, Cristina, Aguilera-Martos, Ignacio, Bello, Beatriz, García-Márquez, Mario, Luzón, M. Victoria

arXiv.org Artificial IntelligenceApr-9-2024

In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defences, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research, facilitating the development of robust and efficient FL applications.

actor, flex, github, (16 more...)

arXiv.org Artificial Intelligence

2404.06127

Country:

Europe > Spain > Andalusia > Granada Province > Granada (0.04)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > Virginia (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)

Add feedback

QRscript: Embedding a Programming Language in QR codes to support Decision and Management

Scanzio, Stefano, Cena, Gianluca, Valenzano, Adriano

arXiv.org Artificial IntelligenceApr-7-2024

Embedding a programming language in a QR code is a new and extremely promising opportunity, as it makes devices and objects smarter without necessarily requiring an Internet connection. In this paper, all the steps needed to translate a program written in a high-level programming language to its binary representation encoded in a QR code, and the opposite process that, starting from the QR code, executes it by means of a virtual machine, have been carefully detailed. The proposed programming language was named QRscript, and can be easily extended so as to integrate new features. One of the main design goals was to produce a very compact target binary code. In particular, in this work we propose a specific sub-language (a dialect) that is aimed at encoding decision trees. Besides industrial scenarios, this is useful in many other application fields. The reported example, related to the configuration of an industrial networked device, highlights the potential of the proposed technology, and permits to better understand all the translation steps.

instruction, programming language, qr code, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ETFA52439.2022.9921530

2404.05073

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Africa > Guinea-Bissau (0.04)

Genre: Workflow (0.68)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)

Add feedback

Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI

Islam, Taminul, Sheakh, Md. Alif, Tahosin, Mst. Sazia, Hena, Most. Hasna, Akash, Shopnil, Jardan, Yousef A. Bin, Wondmie, Gezahign Fentahun, Nafidi, Hiba-Allah, Bourhia, Mohammed

arXiv.org Artificial IntelligenceApr-6-2024

Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.

accuracy, algorithm, dataset, (14 more...)

arXiv.org Artificial Intelligence

2404.04686

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.25)
North America > United States > Wisconsin (0.05)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Accurate estimation of feature importance faithfulness for tree models

Gajewski, Mateusz, Karczmarz, Adam, Rapicki, Mateusz, Sankowski, Piotr

arXiv.org Artificial IntelligenceApr-4-2024

One of the key challenges in deploying modern machine learning models in such areas as medical diagnosis lies in the ability to indicate why a certain prediction has been made. Such an indication may be of critical importance when a human decides whether the prediction can be relied on. This is one of the reasons various aspects of explainability of machine learning models have been the subject of extensive research lately (see, e.g., [BH21]). For some basic types of models (e.g., single decision trees), the rationale behind a prediction is easy to understand by a human. However, predictions of more complex models (that offer much better accuracy, e.g., based on neural networks or decision tree ensembles) are also much more difficult to interpret. Accurate and concise explanations understandable to humans might not always exist. In such cases, it is still beneficial to have methods giving a flavor of what factors might have influenced the prediction the most.

algorithm, dataset, pgi 2, (13 more...)

arXiv.org Artificial Intelligence

2404.03426

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)

Add feedback

Site-specific Deterministic Temperature and Humidity Forecasts with Explainable and Reliable Machine Learning

Han, MengMeng, Leeuwenburg, Tennessee, Murphy, Brad

arXiv.org Artificial IntelligenceApr-4-2024

Site-specific weather forecasts are essential to accurate prediction of power demand and are consequently of great interest to energy operators. However, weather forecasts from current numerical weather prediction (NWP) models lack the fine-scale detail to capture all important characteristics of localised real-world sites. Instead they provide weather information representing a rectangular gridbox (usually kilometres in size). Even after post-processing and bias correction, area-averaged information is usually not optimal for specific sites. Prior work on site optimised forecasts has focused on linear methods, weighted consensus averaging, time-series methods, and others. Recent developments in machine learning (ML) have prompted increasing interest in applying ML as a novel approach towards this problem. In this study, we investigate the feasibility of optimising forecasts at sites by adopting the popular machine learning model gradient boosting decision tree, supported by the Python version of the XGBoost package. Regression trees have been trained with historical NWP and site observations as training data, aimed at predicting temperature and dew point at multiple site locations across Australia. We developed a working ML framework, named 'Multi-SiteBoost' and initial testing results show a significant improvement compared with gridded values from bias-corrected NWP models. The improvement from XGBoost is found to be comparable with non-ML methods reported in literature. With the insights provided by SHapley Additive exPlanations (SHAP), this study also tests various approaches to understand the ML predictions and increase the reliability of the forecasts generated by ML.

feature value, forecast, prediction, (15 more...)

arXiv.org Artificial Intelligence

2404.0331

Country:

Oceania > Australia > Northern Territory > Alice Springs (0.05)
North America > United States > Tennessee (0.04)
Europe > United Kingdom (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Air (0.70)
Energy > Renewable (0.67)
Transportation > Infrastructure & Services > Airport (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback