Decision Tree Learning
Neural Networks as Decision Trees
The recent boom in AI has clearly shown the power of deep neural networks in various tasks, especially in the field of classification problems where the data is high-dimensional and has complex, non-linear relationships with the target variables. However, explaining the decisions of any neural classifier is an incredibly hard problem. While many post-hoc methods such as DeepLift [2] and Layer-Wise Relevance Propagation [3] can help with explaining individual decisions, explaining the global decision mechanisms (or what the model generally looks for) is much more difficult. Because of this, many practitioners in high-stakes fields instead opt for more interpretable models like basic Decision Trees since the decision hierarchy can be clearly visualized and understood by stakeholders. However, basic trees by themselves often do not provide enough accuracy for the task at hand and often ensemble methods like Bagging or Boosting are used to improve the model's performance.
From Conception to Deployment: Intelligent Stroke Prediction Framework using Machine Learning and Performance Evaluation
Ismail, Leila, Materwala, Huned
Stroke is the second leading cause of death worldwide. Machine learning classification algorithms have been widely adopted for stroke prediction. However, these algorithms were evaluated using different datasets and evaluation metrics. Moreover, there is no comprehensive framework for stroke data analytics. This paper proposes an intelligent stroke prediction framework based on a critical examination of machine learning prediction algorithms in the literature. The five most used machine learning algorithms for stroke prediction are evaluated using a unified setup for objective comparison. Comparative analysis and numerical results reveal that the Random Forest algorithm is best suited for stroke prediction.
Four Factors to consider when choosing b/w Decision Tree and Random Forest
The decision to choose between Random Forest and Decision Tree models depends on the complexity of the problem, the size of the dataset, the interpretability of the model, and the trade-off between accuracy and computational efficiency. Complexity of the problem: Decision trees are simpler and easier to interpret, making them a good choice for smaller and less complex problems. However, for larger and more complex problems, Random Forest models can provide better accuracy due to their ability to combine multiple decision trees. Size of the dataset: Decision trees can be sensitive to noise and outliers, and may overfit the data if the dataset is too small. Random Forest models can be more robust to noise and overfitting, making them a good choice for smaller datasets.
Lifting uniform learners via distributional decomposition
Blanc, Guy, Lange, Jane, Malik, Ali, Tan, Li-Yang
We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.
Local Interpretability of Random Forests for Multi-Target Regression
Bardos, Avraam, Mylonas, Nikolaos, Mollas, Ioannis, Tsoumakas, Grigorios
Multi-target regression is useful in a plethora of applications. Although random forest models perform well in these tasks, they are often difficult to interpret. Interpretability is crucial in machine learning, especially when it can directly impact human well-being. Although model-agnostic techniques exist for multi-target regression, specific techniques tailored to random forest models are not available. To address this issue, we propose a technique that provides rule-based interpretations for instances made by a random forest model for multi-target regression, influenced by a recent model-specific technique for random forest interpretability. The proposed technique was evaluated through extensive experiments and shown to offer competitive interpretations compared to state-of-the-art techniques.
Three-way causal attribute partial order structure analysis
Zaifa, Xue, Huibin, Lu, Tao, Zhang, Tao, Li, Xin, Lu
As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the model. First, the concept of causal factor (CF) is proposed to evaluate the causal correlation between attributes and decision attributes in the formal decision context. Then, combining CF with attribute partial order structure, the concept of causal attribute partial order structure is defined and makes set coverage evolve into causal coverage. Finally, combined with the idea of three-way decision, 3WCAPOS is formed, which makes the purity of nodes in the structure clearer and the changes between levels more obviously. In addition, the experiments are carried out from the classification ability and the interpretability of the structure through the six datasets. Through these experiments, it is concluded the accuracy of 3WCAPOS is improved by 1% - 9% compared with classification and regression tree, and more interpretable and the processing of knowledge is more reasonable compared with attribute partial order structure. Keywords: Formal concept analysis, Three-way decision, Attribute partial order structure, Causal inference, Causal factor 1. Introduction Attribute partial order structure analysis (APOSA) is an important method in the field of Concept-cognitive learning (CCL) [4, 31, 32, 19], which explores the relationship between attributes from the perspective of human cognition.
Using Connected Vehicle Trajectory Data to Evaluate the Effects of Speeding
Ugan, Jorge, Abdel-Aty, Mohamed, Islam, Zubayer
Speeding has been and continues to be a major contributing factor to traffic fatalities. Various transportation agencies have proposed speed management strategies to reduce the amount of speeding on arterials. While there have been various studies done on the analysis of speeding proportions above the speed limit, few studies have considered the effect on the individual's journey. Many studies utilized speed data from detectors, which is limited in that there is no information of the route that the driver took. This study aims to explore the effects of various roadway features an individual experiences for a given journey on speeding proportions. Connected vehicle trajectory data was utilized to identify the path that a driver took, along with the vehicle related variables. The level of speeding proportion is predicted using multiple learning models. The model with the best performance, Extreme Gradient Boosting, achieved an accuracy of 0.756. The proposed model can be used to understand how the environment and vehicle's path effects the drivers' speeding behavior, as well as predict the areas with high levels of speeding proportions. The results suggested that features related to an individual driver's trip, i.e., total travel time, has a significant contribution towards speeding. Features that are related to the environment of the individual driver's trip, i.e., proportion of residential area, also had a significant effect on reducing speeding proportions. It is expected that the findings could help inform transportation agencies more on the factors related to speeding for an individual driver's trip.
Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions
Agarwal, Abhineet, Agarwal, Anish, Vijaykumar, Suhas
We consider a setting with $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing combinations of interventions is a problem that naturally arises in many applications such as factorial design experiments, recommendation engines (e.g., showing a set of movies that maximizes engagement for users), combination therapies in medicine, selecting important features for ML models, etc. Running $N \times 2^p$ experiments to estimate the various parameters is infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. To address these challenges, we propose a novel model that imposes latent structure across both units and combinations. We assume latent similarity across units (i.e., the potential outcomes matrix is rank $r$) and regularity in how combinations interact (i.e., the coefficients in the Fourier expansion of the potential outcomes is $s$ sparse). We establish identification for all causal parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish finite-sample consistency under precise conditions on the observation pattern. Our results imply Synthetic Combinations consistently estimates unit-specific potential outcomes given $\text{poly}(r) \times (N + s^2p)$ observations. In comparison, previous methods that do not exploit structure across both units and combinations have sample complexity scaling as $\min(N \times s^2p, \ \ r \times (N + 2^p))$. We use Synthetic Combinations to propose a data-efficient experimental design mechanism for combinatorial causal inference. We corroborate our theoretical findings with numerical simulations.
Counterfactually Fair Regression with Double Machine Learning
Counterfactual fairness is an approach to AI fairness that tries to make decisions based on the outcomes that an individual with some kind of sensitive status would have had without this status. This paper proposes Double Machine Learning (DML) Fairness which analogises this problem of counterfactual fairness in regression problems to that of estimating counterfactual outcomes in causal inference under the Potential Outcomes framework. It uses arbitrary machine learning methods to partial out the effect of sensitive variables on nonsensitive variables and outcomes. Assuming that the effects of the two sets of variables are additively separable, outcomes will be approximately equalised and individual-level outcomes will be counterfactually fair. This paper demonstrates the approach in a simulation study pertaining to discrimination in workplace hiring and an application on real data estimating the GPAs of law school students. It then discusses when it is appropriate to apply such a method to problems of real-world discrimination where constructs are conceptually complex and finally, whether DML Fairness can achieve justice in these settings.
QUBO Decision Tree: Annealing Machine Extends Decision Tree Splitting
Yawata, Koichiro, Osakabe, Yoshihiro, Okuyama, Takuya, Asahara, Akinori
This paper proposes an extension of regression trees by quadratic unconstrained binary optimization (QUBO). Regression trees are very popular prediction models that are trainable with tabular datasets, but their accuracy is insufficient because the decision rules are too simple. The proposed method extends the decision rules in decision trees to multi-dimensional boundaries. Such an extension is generally unimplementable because of computational limitations, however, the proposed method transforms the training process to QUBO, which enables an annealing machine to solve this problem.