AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Superfast Selection for Decision Tree Algorithms

Wang, Huaduo, Gupta, Gopal

arXiv.org Artificial IntelligenceJun-3-2024

We present a novel and systematic method, called Superfast Selection, for selecting the "optimal split" for decision tree and feature selection algorithms over tabular data. The method speeds up split selection on a single feature by lowering the time complexity, from O(MN) (using the standard selection methods) to O(M), where M represents the number of input examples and N the number of unique values. Additionally, the need for pre-encoding, such as one-hot or integer encoding, for feature value heterogeneity is eliminated. To demonstrate the efficiency of Superfast Selection, we empower the CART algorithm by integrating Superfast Selection into it, creating what we call Ultrafast Decision Tree (UDT). This enhancement enables UDT to complete the training process with a time complexity O(KM$^2$) (K is the number of features). Additionally, the Training Only Once Tuning enables UDT to avoid the repetitive training process required to find the optimal hyper-parameter. Experiments show that the UDT can finish a single training on KDD99-10% dataset (494K examples with 41 features) within 1 second and tuning with 214.8 sets of hyper-parameters within 0.25 second on a laptop.

node, superfast selection, uci machine learning repository, (12 more...)

arXiv.org Artificial Intelligence

2405.20622

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Dallas County > Richardson (0.04)
Oceania > Australia (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Margeloiu, Andrei, Bazaga, Adrián, Simidjievski, Nikola, Liò, Pietro, Jamnik, Mateja

arXiv.org Artificial IntelligenceJun-3-2024

Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into a manifold space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the manifold of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it applicable to any classifier. We evaluate TabMDA on five standard classifiers and observe significant performance improvements across various tabular datasets. Our results demonstrate that TabMDA provides an effective way to leverage information from pre-trained in-context models to enhance the performance of downstream classifiers.

classifier, dataset, tabmda, (11 more...)

arXiv.org Artificial Intelligence

2406.01805

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

Add feedback

Learning Decision Trees and Forests with Algorithmic Recourse

Kanamori, Kentaro, Takagi, Takuya, Kobayashi, Ken, Ike, Yuichi

arXiv.org Machine LearningJun-3-2024

This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. In practice, however, such actions do not always exist for models optimized only for predictive performance. To alleviate this issue, we formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency.

integer 0, learning decision tree and forest, ract, (10 more...)

arXiv.org Machine Learning

2406.01098

Country:

Europe > Austria > Vienna (0.14)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Random Forest-based Prediction Model for Turning Points in Antagonistic Event-Group Competitions

Zhu, Zishuo

arXiv.org Artificial IntelligenceJun-1-2024

At present, most of the prediction studies related to antagonistic event-group competitions focus on the prediction of competition results, and less on the prediction of the competition process, which can not provide real-time feedback of the athletes' state information in the actual competition, and thus can not analyze the changes of the competition situation. In order to solve this problem, this paper proposes a prediction model based on Random Forest for the turning point of the antagonistic event-group. Firstly, the quantitative equation of competitive potential energy is proposed; Secondly, the quantitative value of competitive potential energy is obtained by using the dynamic combination of weights method, and the turning point of the competition situation of the antagonistic event-group is marked according to the quantitative time series graph; Finally, the random forest prediction model based on the optimisation of the KM-SMOTE algorithm and the grid search method is established. The experimental analysis shows that: The quantitative equation of competitive potential energy can effectively reflect the dynamic situation of the competition; The model can effectively predict the turning point of the competition situation of the antagonistic event-group, and the recall rate of the model in the test set is 86.13%; The model has certain significance for the future study of the competition situation of the antagonistic event-group.

competition, competitive potential energy, indicator, (14 more...)

arXiv.org Artificial Intelligence

2405.20029

Country:

Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Tennis (0.95)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Federated Random Forest for Partially Overlapping Clinical Data

Park, Youngjun, Schmidt, Cord Eric, Batton, Benedikt Marcel, Hauschild, Anne-Christin

arXiv.org Artificial IntelligenceMay-31-2024

In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, or other recorded patient history information across hospitals or institutes. To address the challenges posed by partially overlapping features and incomplete data in clinical datasets, a comprehensive approach is required. Particularly in the domain of medical data, promising outcomes are achieved by federated random forests whenever features align. However, for most standard algorithms, like random forest, it is essential that all data sets have identical parameters. Therefore, in this work the concept of federated random forest is adapted to a setting with partially overlapping features. Moreover, our research assesses the effectiveness of the newly developed federated random forest models for partially overlapping clinical data. For aggregating the federated, globally optimized model, only features available locally at each site can be used. We tackled two issues in federation: (i) the quantity of involved parties, (ii) the varying overlap of features. This evaluation was conducted across three clinical datasets. The federated random forest model even in cases where only a subset of features overlaps consistently demonstrates superior performance compared to its local counterpart. This holds true across various scenarios, including datasets with imbalanced classes. Consequently, federated random forests for partially overlapped data offer a promising solution to transcend barriers in collaborative research and corporate cooperation.

dataset, forest model, random forest, (14 more...)

arXiv.org Artificial Intelligence

2405.20738

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Counterfactual Metarules for Local and Global Recourse

Bewley, Tom, Amoukou, Salim I., Mishra, Saumitra, Magazzeni, Daniele, Veloso, Manuela

arXiv.org Artificial IntelligenceMay-29-2024

We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that T-CREx achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.

dataset, explanation, metarule, (13 more...)

arXiv.org Artificial Intelligence

2405.18875

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.48)

Add feedback

Unlocking Futures: A Natural Language Driven Career Prediction System for Computer Science and Software Engineering Students

Faruque, Sakir Hossain, Khushbu, Sharun Akter, Akter, Sharmin

arXiv.org Artificial IntelligenceMay-28-2024

A career is a crucial aspect for any person to fulfill their desires through hard work. During their studies, students cannot find the best career suggestions unless they receive meaningful guidance tailored to their skills. Therefore, we developed an AI-assisted model for early prediction to provide better career suggestions. Although the task is difficult, proper guidance can make it easier. Effective career guidance requires understanding a student's academic skills, interests, and skill-related activities. In this research, we collected essential information from Computer Science (CS) and Software Engineering (SWE) students to train a machine learning (ML) model that predicts career paths based on students' career-related information. To adequately train the models, we applied Natural Language Processing (NLP) techniques and completed dataset pre-processing. For comparative analysis, we utilized multiple classification ML algorithms and deep learning (DL) algorithms. This study contributes valuable insights to educational advising by providing specific career suggestions based on the unique features of CS and SWE students. Additionally, the research helps individual CS and SWE students find suitable jobs that match their skills, interests, and skill-related activities.

accuracy, algorithm, student, (13 more...)

arXiv.org Artificial Intelligence

2405.18139

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Europe > Switzerland (0.04)
Europe > Germany > Berlin (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise

Sztukiewicz, Lukasz, Good, Jack Henry, Dubrawski, Artur

arXiv.org Machine LearningMay-27-2024

In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.

backward 0, correction 0, forward 0, (13 more...)

arXiv.org Machine Learning

2405.17672

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Machine learning in business process management: A systematic literature review

Weinzierl, Sven, Zilker, Sandra, Dunzer, Sebastian, Matzner, Martin

arXiv.org Artificial IntelligenceMay-25-2024

Machine learning (ML) provides algorithms to create computer programs based on data without explicitly programming them. In business process management (BPM), ML applications are used to analyse and improve processes efficiently. Three frequent examples of using ML are providing decision support through predictions, discovering accurate process models, and improving resource allocation. This paper organises the body of knowledge on ML in BPM. We extract BPM tasks from different literature streams, summarise them under the phases of a process`s lifecycle, explain how ML helps perform these tasks and identify technical commonalities in ML implementations across tasks. This study is the first exhaustive review of how ML has been used in BPM. We hope that it can open the door for a new era of cumulative research by helping researchers to identify relevant preliminary work and then combine and further develop existing approaches in a focused fashion. Our paper helps managers and consultants to find ML applications that are relevant in the current project phase of a BPM initiative, like redesigning a business process. We also offer - as a synthesis of our review - a research agenda that spreads ten avenues for future research, including applying novel ML concepts like federated learning, addressing less regarded BPM lifecycle phases like process identification, and delivering ML applications with a focus on end-users.

information system, literature review, process mining, (12 more...)

arXiv.org Artificial Intelligence

2405.16396

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Banking & Finance (0.92)
Education (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(14 more...)

Add feedback

Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach

Liu, Ziming, Liu, Longjian, Heidel, Robert E., Zhao, Xiaopeng

arXiv.org Artificial IntelligenceMay-25-2024

This article uses machine learning (ML) and explainable artificial intelligence (XAI) techniques to investigate the relationship between nutritional status and mortality rates associated with Alzheimers disease (AD). The Third National Health and Nutrition Examination Survey (NHANES III) database is employed for analysis. The random forest model is selected as the base model for XAI analysis, and the Shapley Additive Explanations (SHAP) method is used to assess feature importance. The results highlight significant nutritional factors such as serum vitamin B12 and glycated hemoglobin. The study demonstrates the effectiveness of random forests in predicting AD mortality compared to other diseases. This research provides insights into the impact of nutrition on AD and contributes to a deeper understanding of disease progression.

alzheimer, mortality, nutritional feature, (14 more...)

arXiv.org Artificial Intelligence

2405.17502

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.15)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre:

Research Report > Experimental Study (0.89)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Public Health (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.91)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)

Add feedback