Decision Tree Learning
An End-to-End Approach for Online Decision Mining and Decision Drift Analysis in Process-Aware Information Systems: Extended Version
Scheibel, Beate, Rinderle-Ma, Stefanie
Decision mining enables the discovery of decision rules from event logs or streams, and constitutes an important part of in-depth analysis and optimisation of business processes. So far, decision mining has been merely applied in an ex-post way resulting in a snapshot of decision rules for the given chunk of log data. Online decision mining, by contrast, enables continuous monitoring of decision rule evolution and decision drift. Hence this paper presents an end-to-end approach for the discovery as well as monitoring of decision points and the corresponding decision rules during runtime, bridging the gap between online control flow discovery and decision mining. The approach provides automatic decision support for process-aware information systems with efficient decision drift discovery and monitoring. For monitoring, not only the performance, in terms of accuracy, of decision rules is taken into account, but also the occurrence of data elements and changes in branching frequency. The paper provides two algorithms, which are evaluated on four synthetic and one real-life data set, showing feasibility and applicability of the approach. Overall, the approach fosters the understanding of decisions in business processes and hence contributes to an improved human-process interaction.
KDSM: An uplift modeling framework based on knowledge distillation and sample matching
Sun, Chang, Li, Qianying, Wang, Guanxiang, Xu, Sihao, Liu, Yitong
Uplift modeling aims to estimate the treatment effect on individuals, widely applied in the e-commerce platform to target persuadable customers and maximize the return of marketing activities. Among the existing uplift modeling methods, tree-based methods are adept at fitting increment and generalization, while neural-network-based models excel at predicting absolute value and precision, and these advantages have not been fully explored and combined. Also, the lack of counterfactual sample pairs is the root challenge in uplift modeling. In this paper, we proposed an uplift modeling framework based on Knowledge Distillation and Sample Matching (KDSM). The teacher model is the uplift decision tree (UpliftDT), whose structure is exploited to construct counterfactual sample pairs, and the pairwise incremental prediction is treated as another objective for the student model. Under the idea of multitask learning, the student model can achieve better performance on generalization and even surpass the teacher. Extensive offline experiments validate the universality of different combinations of teachers and student models and the superiority of KDSM measured against the baselines. In online A/B testing, the cost of each incremental room night is reduced by 6.5\%.
Very fast, approximate counterfactual explanations for decision forests
Carreira-Perpiñán, Miguel Á., Hada, Suryabhan Singh
We consider finding a counterfactual explanation for a classification or regression forest, such as a random forest. This requires solving an optimization problem to find the closest input instance to a given instance for which the forest outputs a desired value. Finding an exact solution has a cost that is exponential on the number of leaves in the forest. We propose a simple but very effective approach: we constrain the optimization to only those input space regions defined by the forest that are populated by actual data points. The problem reduces to a form of nearest-neighbor search using a certain distance on a certain dataset. This has two advantages: first, the solution can be found very quickly, scaling to large forests and high-dimensional data, and enabling interactive use. Second, the solution found is more likely to be realistic in that it is guided towards high-density areas of input space.
T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction
Yan, Jiahuan, Chen, Jintai, Wu, Yixuan, Chen, Danny Z., Wu, Jian
Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction. However, the heterogeneity nature of tabular features makes such features relatively independent, and developing effective methods to promote tabular feature interaction still remains an open problem. In this paper, we propose a novel Graph Estimator, which automatically estimates the relations among tabular features and builds graphs by assigning edges between related features. Such relation graphs organize independent tabular features into a kind of graph data such that interaction of nodes (tabular features) can be conducted in an orderly fashion. Based on our proposed Graph Estimator, we present a bespoke Transformer network tailored for tabular learning, called T2G-Former, which processes tabular data by performing tabular feature interaction guided by the relation graphs. A specific Cross-level Readout collects salient features predicted by the layers in T2G-Former across different levels, and attains global semantics for final prediction. Comprehensive experiments show that our T2G-Former achieves superior performance among DNNs and is competitive with non-deep Gradient Boosted Decision Tree models.
10 Decision Trees are Better Than 1
In the previous article of this series, I reviewed decision trees and how we can use them to make predictions. However, for many real-world problems, a single decision tree is often prone to bias and overfitting. We saw this in our example from the last blog, where even after a little hyperparameter tuning, our decision tree was still wrong 35% of the time. A solution to this poor performance problem is to use an ensemble of decision trees rather than just one. The key benefit of tree ensembles is they generally have better performance than a single decision tree. While there are many ways we could combine a set of decision trees to improve performance, two popular methods are bagging and boosting.
A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression
Gerstorfer, Yannick, Krieg, Lena, Hahn-Klimroth, Max
In many studies, we want to determine the influence of certain features on a dependent variable. More specifically, we are interested in the strength of the influence -- i.e., is the feature relevant? -- and, if so, how the feature influences the dependent variable. Recently, data-driven approaches such as \emph{random forest regression} have found their way into applications (Boulesteix et al., 2012). These models allow to directly derive measures of feature importance, which are a natural indicator of the strength of the influence. For the relevant features, the correlation or rank correlation between the feature and the dependent variable has typically been used to determine the nature of the influence. More recent methods, some of which can also measure interactions between features, are based on a modeling approach. In particular, when machine learning models are used, SHAP scores are a recent and prominent method to determine these trends (Lundberg et al., 2017). In this paper, we introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method. Furthermore, we propose two estimators for identifying trends in the data using random forest regression, the so-called absolute and relative transversal rate. We empirically compare the properties of our estimators with those of well-established estimators on a variety of synthetic and real-world datasets.
VW unveils second-gen ID.3 EV and an app store for its cars
After months of teasers, the company has introduced a second-generation ID.3 that addresses criticisms of the first model. The new compact car offers a "sharper" design with improved aerodynamics and a higher-quality (and heavily recycled) interior. More importantly, VW has upgraded the technology -- including its software, which garnered a long list of complaints from drivers. The second-gen ID.3 includes the "latest software," with a simpler layout, better performance and over-the-air updates. The 12-inch infotainment display is now standard.
Interpretability and Explainability: A Machine Learning Zoo Mini-tour
Marcinkevičs, Ričards, Vogt, Julia E.
In this literature review, we provided a survey of interpretable and explainable machine learning methods (see Tables 1 and 2 for the summary of the techniques), described commonest goals and desiderata for these techniques, motivated their relevance in several fields of application, and discussed their quantitative evaluation. Interpretability and explainability still remain an active area of research, especially, in the face of recent rapid progress in designing highly performant predictive models and inevitable infusion of machine learning into other domains, where decisions have far-reaching consequences. For years the field has been challenged by a lack of clear definitions for interpretability or explainability, these terms being often wielded "in a quasi-mathematical way"[6,122]. For many techniques, there still exist no satisfactory functionally-grounded evaluation criteria and universally accepted benchmarks, hindering reproducibility and model comparison. Moreover, meaningful adaptations of these methods to'real-world' machine learning systems and data analysis problems largely remain a matter for the future. It has been argued that, for successful and widespread use of interpretable and explainable machine learning models, stakeholders need to be involved in the discussion[4, 122]. A meaningful and equal collaboration between machine learning researchers and stakeholders from various domains, such as medicine, natural sciences, and law, is a logical next step within the evolution of interpretable and explainable ML.
Tightness of prescriptive tree-based mixed-integer optimization formulations
We focus on modeling the relationship between an input feature vector and the predicted outcome of a trained decision tree using mixed-integer optimization. This can be used in many practical applications where a decision tree or tree ensemble is incorporated into an optimization problem to model the predicted outcomes of a decision. We propose tighter mixed-integer optimization formulations than those previously introduced. Existing formulations can be shown to have linear relaxations that have fractional extreme points, even for the simple case of modeling a single decision tree. A formulation we propose, based on a projected union of polyhedra approach, is ideal for a single decision tree. While the formulation is generally not ideal for tree ensembles or if additional constraints are added, it generally has fewer extreme points, leading to a faster time to solve, particularly if the formulation has relatively few trees. However, previous work has shown that formulations based on a binary representation of the feature vector perform well computationally and hence are attractive for use in practical applications. We present multiple approaches to tighten existing formulations with binary vectors, and show that fractional extreme points are removed when there are multiple splits on the same feature. At an extreme, we prove that this results in ideal formulations for tree ensembles modeling a one-dimensional feature vector. Building on this result, we also show via numerical simulations that these additional constraints result in significantly tighter linear relaxations when the feature vector is low dimensional. We also present instances where the time to solve to optimality is significantly improved using these formulations.
Fusion of ML with numerical simulation for optimized propeller design
Vardhan, Harsh, Volgyesi, Peter, Sztipanovits, Janos
In computer-aided engineering design, the goal of a designer is to find an optimal design on a given requirement using the numerical simulator in loop with an optimization method. In this design optimization process, a good design optimization process is one that can reduce the time from inception to design. In this work, we take a class of design problem, that is computationally cheap to evaluate but has high dimensional design space. In such cases, traditional surrogate-based optimization does not offer any benefits. In this work, we propose an alternative way to use ML model to surrogate the design process that formulates the search problem as an inverse problem and can save time by finding the optimal design or at least a good initial seed design for optimization. By using this trained surrogate model with the traditional optimization method, we can get the best of both worlds. We call this as Surrogate Assisted Optimization (SAO)- a hybrid approach by mixing ML surrogate with the traditional optimization method. Empirical evaluations of propeller design problems show that a better efficient design can be found in fewer evaluations using SAO.