AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Early Prediction of Geomagnetic Storms by Machine Learning Algorithms

Yan, Iris

arXiv.org Artificial IntelligenceJan-17-2024

Geomagnetic storms (GS) occur when solar winds disrupt Earth's magnetosphere. GS can cause severe damages to satellites, power grids, and communication infrastructures. Estimate of direct economic impacts of a large scale GS exceeds $40 billion a day in the US. Early prediction is critical in preventing and minimizing the hazards. However, current methods either predict several hours ahead but fail to identify all types of GS, or make predictions within short time, e.g., one hour ahead of the occurrence. This work aims to predict all types of geomagnetic storms reliably and as early as possible using big data and machine learning algorithms. By fusing big data collected from multiple ground stations in the world on different aspects of solar measurements and using Random Forests regression with feature selection and downsampling on minor geomagnetic storm instances (which carry majority of the data), we are able to achieve an accuracy of 82.55% on data collected in 2021 when making early predictions three hours in advance. Given that important predictive features such as historic Kp indices are measured every 3 hours and their importance decay quickly with the amount of time in advance, an early prediction of 3 hours ahead of time is believed to be close to the practical limit.

geomagnetic storm, kp index, prediction, (12 more...)

arXiv.org Artificial Intelligence

2401.1029

Country:

North America > United States (0.68)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)

Add feedback

Comparative Study on the Performance of Categorical Variable Encoders in Classification and Regression Tasks

Zhu, Wenbin, Qiu, Runwen, Fu, Ying

arXiv.org Artificial IntelligenceJan-17-2024

Categorical variables often appear in datasets for classification and regression tasks, and they need to be encoded into numerical values before training. Since many encoders have been developed and can significantly impact performance, choosing the appropriate encoder for a task becomes a time-consuming yet important practical issue. This study broadly classifies machine learning models into three categories: 1) ATI models that implicitly perform affine transformations on inputs, such as multi-layer perceptron neural network; 2) Tree-based models that are based on decision trees, such as random forest; and 3) the rest, such as kNN. Theoretically, we prove that the one-hot encoder is the best choice for ATI models in the sense that it can mimic any other encoders by learning suitable weights from the data. We also explain why the target encoder and its variants are the most suitable encoders for tree-based models. This study conducted comprehensive computational experiments to evaluate 14 encoders, including one-hot and target encoders, along with eight common machine-learning models on 28 datasets. The computational results agree with our theoretical analysis. The findings in this study shed light on how to select the suitable encoder for data scientists in fields such as fraud detection, disease diagnosis, etc.

categorical variable, dataset, encoder, (16 more...)

arXiv.org Artificial Intelligence

2401.09682

Country:

Europe > United Kingdom (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

Liao, Yufan, Wu, Qi, Yan, Xing

arXiv.org Artificial IntelligenceJan-17-2024

Out-Of-Distribution (OOD) generalization is an essential topic in machine learning. However, recent research is only focusing on the corresponding methods for neural networks. This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT). IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is constructed. Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets. The superior performance compared to non-OOD tree models implies that considering OOD generalization for tree models is absolutely necessary and should be given more attention.

dataset, regression task, splitting variable, (16 more...)

arXiv.org Artificial Intelligence

2312.04273

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

Cummins, Logan, Sommers, Alex, Ramezani, Somayeh Bakhtiari, Mittal, Sudip, Jabour, Joseph, Seale, Maria, Rahimi, Shahram

arXiv.org Artificial IntelligenceJan-15-2024

Predictive maintenance is a well studied collection of techniques that aims to prolong the life of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time costs of upkeep. As these methods are adopted for more serious and potentially life-threatening applications, the human operators need trust the predictive system. This attracts the field of Explainable AI (XAI) to introduce explainability and interpretability into the predictive system. XAI brings methods to the field of predictive maintenance that can amplify trust in the users while maintaining well-performing systems. This survey on explainable predictive maintenance (XPM) discusses and presents the current methods of XAI as applied to predictive maintenance while following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines. We categorize the different XPM methods into groups that follow the XAI literature. Additionally, we include current challenges and a discussion on future research directions in XPM.

data mining, explanation, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2401.07871

Country:

North America > United States > Mississippi (0.28)
North America > United States > Maryland (0.27)
Europe (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
(9 more...)

Add feedback

Machine Learning Techniques to Identify Hand Gestures amidst Forearm Muscle Signals

Cho, Ryan, Patel, Sunil, Cho, Kyu Taek, Hwang, Jaejin

arXiv.org Artificial IntelligenceJan-15-2024

This study investigated the use of forearm EMG data for distinguishing eight hand gestures, employing the Neural Network and Random Forest algorithms on data from ten participants. The Neural Network achieved 97 percent accuracy with 1000-millisecond windows, while the Random Forest achieved 85 percent accuracy with 200-millisecond windows. Larger window sizes improved gesture classification due to increased temporal resolution. The Random Forest exhibited faster processing at 92 milliseconds, compared to the Neural Network's 124 milliseconds. In conclusion, the study identified a Neural Network with a 1000-millisecond stream as the most accurate (97 percent), and a Random Forest with a 200-millisecond stream as the most efficient (85 percent). Future research should focus on increasing sample size, incorporating more hand gestures, and exploring different feature extraction methods and modeling algorithms to enhance system accuracy and efficiency.

accuracy, algorithm, application, (16 more...)

arXiv.org Artificial Intelligence

2401.07889

Country:

North America > United States > Illinois > DeKalb County > DeKalb (0.04)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming

van der Linden, Jacobus G. M., de Weerdt, Mathijs M., Demirović, Emir

arXiv.org Artificial IntelligenceJan-15-2024

Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show the necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on five application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.

constraint, optimization task, streed, (15 more...)

arXiv.org Artificial Intelligence

2305.19706

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions

Liu, Xiao, Zhao, Jie, Chen, Wubing, Tan, Mao, Su, Yongxing

arXiv.org Artificial IntelligenceJan-14-2024

Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.

agent, bone, self-interpretable model, (15 more...)

arXiv.org Artificial Intelligence

2401.07263

Country:

Asia > Indonesia (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trinary Decision Trees for handling missing data

Zakrisson, Henning

arXiv.org Machine LearningJan-11-2024

This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.

algorithm, node, trinary tree, (15 more...)

arXiv.org Machine Learning

2309.03561

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

When eBPF Meets Machine Learning: On-the-fly OS Kernel Compartmentalization

Wang, Zicheng, Chen, Tiejin, Dai, Qinrun, Chen, Yueqi, Wei, Hua, Zeng, Qingkai

arXiv.org Artificial IntelligenceJan-10-2024

Compartmentalization effectively prevents initial corruption from turning into a successful attack. This paper presents O2C, a pioneering system designed to enforce OS kernel compartmentalization on the fly. It not only provides immediate remediation for sudden threats but also maintains consistent system availability through the enforcement process. O2C is empowered by the newest advancements of the eBPF ecosystem which allows to instrument eBPF programs that perform enforcement actions into the kernel at runtime. O2C takes the lead in embedding a machine learning model into eBPF programs, addressing unique challenges in on-the-fly compartmentalization. Our comprehensive evaluation shows that O2C effectively confines damage within the compartment. Further, we validate that decision tree is optimally suited for O2C owing to its advantages in processing tabular data, its explainable nature, and its compliance with the eBPF ecosystem. Last but not least, O2C is lightweight, showing negligible overhead and excellent sacalability system-wide.

compartment, ebpf program, kernel, (12 more...)

arXiv.org Artificial Intelligence

2401.05641

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > Arizona (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Optimized Ensemble Model Towards Secured Industrial IoT Devices

Injadat, MohammadNoor

arXiv.org Artificial IntelligenceJan-10-2024

The continued growth in the deployment of Internet-of-Things (IoT) devices has been fueled by the increased connectivity demand, particularly in industrial environments. However, this has led to an increase in the number of network related attacks due to the increased number of potential attack surfaces. Industrial IoT (IIoT) devices are prone to various network related attacks that can have severe consequences on the manufacturing process as well as on the safety of the workers in the manufacturing plant. One promising solution that has emerged in recent years for attack detection is Machine learning (ML). More specifically, ensemble learning models have shown great promise in improving the performance of the underlying ML models. Accordingly, this paper proposes a framework based on the combined use of Bayesian Optimization-Gaussian Process (BO-GP) with an ensemble tree-based learning model to improve the performance of intrusion and attack detection in IIoT environments. The proposed framework's performance is evaluated using the Windows 10 dataset collected by the Cyber Range and IoT labs at University of New South Wales. Experimental results illustrate the improvement in detection accuracy, precision, and F-score when compared to standard tree and ensemble tree models.

dataset, detection, ensemble, (15 more...)

arXiv.org Artificial Intelligence

2401.05509

Country:

Oceania > Australia > New South Wales (0.24)
North America > Canada > Ontario (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
(2 more...)

Add feedback