AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

A New Class of Explanations for Classifiers with Non-Binary Features

Ji, Chunxi, Darwiche, Adnan

arXiv.org Artificial IntelligenceJul-22-2023

Two types of explanations have been receiving increased attention in the literature when analyzing the decisions made by classifiers. The first type explains why a decision was made and is known as a sufficient reason for the decision, also an abductive explanation or a PI-explanation. The second type explains why some other decision was not made and is known as a necessary reason for the decision, also a contrastive or counterfactual explanation. These explanations were defined for classifiers with binary, discrete and, in some cases, continuous features. We show that these explanations can be significantly improved in the presence of non-binary features, leading to a new class of explanations that relay more information about decisions and the underlying classifiers. Necessary and sufficient reasons were also shown to be the prime implicates and implicants of the complete reason for a decision, which can be obtained using a quantification operator. We show that our improved notions of necessary and sufficient reasons are also prime implicates and implicants but for an improved notion of complete reason obtained by a new quantification operator that we also define and study.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.1476

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.27)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)

Add feedback

Finding Optimal Diverse Feature Sets with Alternative Feature Selection

Bach, Jakob

arXiv.org Artificial IntelligenceJul-21-2023

Feature selection is popular for obtaining small, interpretable, yet highly accurate prediction models. Conventional feature-selection methods typically yield one feature set only, which might not suffice in some scenarios. For example, users might be interested in finding alternative feature sets with similar prediction quality, offering different explanations of the data. In this article, we introduce alternative feature selection and formalize it as an optimization problem. In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives. Next, we analyze the complexity of this optimization problem and show NP-hardness. Further, we discuss how to integrate conventional feature-selection methods as objectives. Finally, we evaluate alternative feature selection with 30 classification datasets. We observe that alternative feature sets may indeed have high prediction quality, and we analyze several factors influencing this outcome.

artificial intelligence, feature-set quality, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.11607

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(35 more...)

Genre:

Research Report (1.00)
Overview (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning

Ma, Wenxuan, Yan, Xing, Zhang, Kun

arXiv.org Artificial IntelligenceJul-19-2023

To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.

artificial intelligence, decision tree learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2212.12658

Country:

Asia > China > Beijing > Beijing (0.06)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Strong Optimal Classification Trees

Aghaei, Sina, Gómez, Andrés, Vayanos, Phebe

arXiv.org Artificial IntelligenceJul-18-2023

Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary classification trees with univariate splits. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer optimization (MIO) technology. Yet, existing MIO-based approaches from the literature do not leverage the power of MIO to its full extent: they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose an intuitive flow-based MIO formulation for learning optimal binary classification trees. Our formulation can accommodate side constraints to enable the design of interpretable and fair decision trees. Moreover, we show that our formulation has a stronger linear optimization relaxation than existing methods in the case of binary data. We exploit the decomposable structure of our formulation and max-flow/min-cut duality to derive a Benders' decomposition method to speed-up computation. We propose a tailored procedure for solving each decomposed subproblem that provably generates facets of the feasible set of the MIO as constraints to add to the main problem. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 29 times faster than state-of-the-art MIO-based techniques and improve out-of-sample performance by up to 8%.

artificial intelligence, latexit sha1, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2103.15965

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Monterey County > Monterey (0.04)

Genre: Research Report (1.00)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Using Decision Trees for Interpretable Supervised Clustering

Kokash, Natallia, Makhnist, Leonid

arXiv.org Artificial IntelligenceJul-16-2023

In this paper, we address an issue of finding explainable clusters of class-uniform data in labelled datasets. The issue falls into the domain of interpretable supervised clustering. Unlike traditional clustering, supervised clustering aims at forming clusters of labelled data with high probability densities. We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules. We propose an iterative method to extract high-density clusters with the help of decisiontree-based classifiers as the most intuitive learning method, and discuss the method of node selection to maximize quality of identified groups.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.08104

Country:

Europe > Netherlands > North Holland > Amsterdam (0.06)
Asia > China > Shanghai > Shanghai (0.05)
South America > Peru (0.04)
(34 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Batch Updating of a Posterior Tree Distribution over a Meta-Tree

Nakahara, Yuta, Matsushima, Toshiyasu

arXiv.org Artificial IntelligenceJul-16-2023

Previously, we proposed a probabilistic data generation model represented by an unobservable tree and a sequential updating method to calculate a posterior distribution over a set of trees. The set is called a meta-tree. In this paper, we propose a more efficient batch updating method.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2303.09705

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

Add feedback

Real-time Traffic Classification for 5G NSA Encrypted Data Flows With Physical Channel Records

Fei, Xiao, Martins, Philippe, Lu, Jialiang

arXiv.org Artificial IntelligenceJul-15-2023

The classification of fifth-generation New-Radio (5G-NR) mobile network traffic is an emerging topic in the field of telecommunications. It can be utilized for quality of service (QoS) management and dynamic resource allocation. However, traditional approaches such as Deep Packet Inspection (DPI) can not be directly applied to encrypted data flows. Therefore, new real-time encrypted traffic classification algorithms need to be investigated to handle dynamic transmission. In this study, we examine the real-time encrypted 5G Non-Standalone (NSA) application-level traffic classification using physical channel records. Due to the vastness of their features, decision-tree-based gradient boosting algorithms are a viable approach for classification. We generate a noise-limited 5G NSA trace dataset with traffic from multiple applications. We develop a new pipeline to convert sequences of physical channel records into numerical vectors. A set of machine learning models are tested, and we propose our solution based on Light Gradient Boosting Machine (LGBM) due to its advantages in fast parallel training and low computational burden in practical scenarios. Our experiments demonstrate that our algorithm can achieve 95% accuracy on the classification task with a state-of-the-art response time as quick as 10ms.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.07756

Country:

Asia > South Korea (0.14)
Asia > China > Shanghai > Shanghai (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

The Growth of E-Bike Use: A Machine Learning Approach

Gupta, Aditya, Chitgopekar, Samarth, Kim, Alexander, Jiang, Joseph, Wang, Megan, Grattoni, Christopher

arXiv.org Artificial IntelligenceJul-14-2023

We present our work on electric bicycles (e-bikes) and their implications for policymakers in the United States. E-bikes have gained significant popularity as a fast and eco-friendly transportation option. As we strive for a sustainable energy plan, understanding the growth and impact of e-bikes is crucial for policymakers. Our mathematical modeling offers insights into the value of e-bikes and their role in the future. Using an ARIMA model, a supervised machine-learning algorithm, we predicted the growth of e-bike sales in the U.S. Our model, trained on historical sales data from January 2006 to December 2022, projected sales of 1.3 million units in 2025 and 2.113 million units in 2028. To assess the factors contributing to e-bike usage, we employed a Random Forest regression model. The most significant factors influencing e-bike sales growth were disposable personal income and popularity. Furthermore, we examined the environmental and health impacts of e-bikes. Through Monte Carlo simulations, we estimated the reduction in carbon emissions due to e-bike use and the calories burned through e-biking. Our findings revealed that e-bike usage in the U.S. resulted in a reduction of 15,737.82 kilograms of CO2 emissions in 2022. Additionally, e-bike users burned approximately 716,630.727 kilocalories through their activities in the same year. Our research provides valuable insights for policymakers, emphasizing the potential of e-bikes as a sustainable transportation solution. By understanding the growth factors and quantifying the environmental and health benefits, policymakers can make informed decisions about integrating e-bikes into future energy and transportation strategies.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.02034

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.27)
North America > United States > Vermont (0.05)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Energy (1.00)
Transportation > Passenger (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Automatically Reconciling the Trade-off between Prediction Accuracy and Earliness in Prescriptive Business Process Monitoring

Metzger, Andreas, Kley, Tristan, Rothweiler, Aristide, Pohl, Klaus

arXiv.org Artificial IntelligenceJul-12-2023

Prescriptive business process monitoring provides decision support to process managers on when and how to adapt an ongoing business process to prevent or mitigate an undesired process outcome. We focus on the problem of automatically reconciling the trade-off between prediction accuracy and prediction earliness in determining when to adapt. Adaptations should happen sufficiently early to provide enough lead time for the adaptation to become effective. However, earlier predictions are typically less accurate than later predictions. This means that acting on less accurate predictions may lead to unnecessary adaptations or missed adaptations. Different approaches were presented in the literature to reconcile the trade-off between prediction accuracy and earliness. So far, these approaches were compared with different baselines, and evaluated using different data sets or even confidential data sets. This limits the comparability and replicability of the approaches and makes it difficult to choose a concrete approach in practice. We perform a comparative evaluation of the main alternative approaches for reconciling the trade-off between prediction accuracy and earliness. Using four public real-world event log data sets and two types of prediction models, we assess and compare the cost savings of these approaches. The experimental results indicate which criteria affect the effectiveness of an approach and help us state initial recommendations for the selection of a concrete approach in practice.

data mining, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.is.2023.102254

2307.05939

Country:

Oceania > Australia (0.28)
Europe > Italy (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
(8 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

Physics-constrained Random Forests for Turbulence Model Uncertainty Estimation

Matha, Marcel, Morsbach, Christian

arXiv.org Artificial IntelligenceJul-11-2023

Data-driven approaches, aided by high-fidelity simulations and Machine Learning (ML), are gaining popularity To achieve virtual certification for industrial design, in RANS turbulence modeling (Heyse et al., 2021; quantifying the uncertainties in simulationdriven Matha & Kucharczyk, 2022). The present study, which processes is crucial. We discuss a physicsconstrained was the foundation of the CFD application results in our approach to account for epistemic previous paper (Matha et al., 2023), focuses on the use of uncertainty of turbulence models. In order to data-driven methods in order to identify flow regions with eliminate user input, we incorporate a data-driven potential turbulence model prediction inaccuracies.

artificial intelligence, machine learning, simulation, (12 more...)

arXiv.org Artificial Intelligence

2306.1337

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)

Genre: Research Report (0.82)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.42)

Add feedback