treeshap
Linear TreeShap Peng Yu
Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.
- Oceania > New Zealand > North Island > Waikato (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm
Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA functional decomposition is a powerful explainability method, as it breaks down black-box models into a unique sum of lower-dimensional functions, provided that input variables are independent. In standard learning settings, input variables are often dependent, and the Hoeffding decomposition is generalized through hierarchical orthogonality constraints. Such generalization leads to unique and sparse decompositions with well-defined main effects and interactions. However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. We show the convergence of TreeHFD, along with the main properties of orthogonality, sparsity, and causal variable selection. The high performance of TreeHFD is demonstrated through experiments on both simulated and real data, using our treehfd Python package (https://github.com/ThalesGroup/treehfd). Besides, we empirically show that the widely used TreeSHAP method, based on Shapley values, is strongly connected to the Hoeffding decomposition.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
STRIDE: Subset-Free Functional Decomposition for XAI in Tabular Settings
Most explainable AI (XAI) frameworks are limited in their expressiveness, summarizing complex feature effects as single scalar values ϕ_i. This approach answers "what" features are important but fails to reveal "how" they interact. Furthermore, methods that attempt to capture interactions, like those based on Shapley values, often face an exponential computational cost. We present STRIDE, a scalable framework that addresses both limitations by reframing explanation as a subset-enumeration-free, orthogonal "functional decomposition" in a Reproducing Kernel Hilbert Space (RKHS). In the tabular setups we study, STRIDE analytically computes functional components f_S(x_S) via a recursive kernel-centering procedure. The approach is model-agnostic and theoretically grounded with results on orthogonality and L^2 convergence. In tabular benchmarks (10 datasets, median over 10 seeds), STRIDE attains a 3.0 times median speedup over TreeSHAP and a mean R^2=0.93 for reconstruction. We also introduce "component surgery", a diagnostic that isolates a learned interaction and quantifies its contribution; on California Housing, removing a single interaction reduces test R^2 from 0.019 to 0.027.
- North America > United States > California (0.25)
- Europe > United Kingdom (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Surrogate Interpretable Graph for Random Decision Forests
Dubey, Akshat, Anžel, Aleksandar, Hattab, Georges
The field of health informatics has been profoundly influenced by the development of random forest models, which have led to significant advances in the interpretability of feature interactions. These models are characterized by their robustness to overfitting and parallelization, making them particularly useful in this domain. However, the increasing number of features and estimators in random forests can prevent domain experts from accurately interpreting global feature interactions, thereby compromising trust and regulatory compliance. A method called the surrogate interpretability graph has been developed to address this issue. It uses graphs and mixed-integer linear programming to analyze and visualize feature interactions. This improves their interpretability by visualizing the feature usage per decision-feature-interaction table and the most dominant hierarchical decision feature interactions for predictions. The implementation of a surrogate interpretable graph enhances global interpretability, which is critical for such a high-stakes domain.
- North America > United States (0.14)
- Europe > Germany > Berlin (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
- (2 more...)
Linear tree shap
Decision trees are well-known due to their ease of interpretability.To improve accuracy, we need to grow deep trees or ensembles of trees.These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required.
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method
Madakkatel, Iqbal, Hyppönen, Elina
Shapley values have been used extensively in machine learning, not only to explain black box machine learning models, but among other tasks, also to conduct model debugging, sensitivity and fairness analyses and to select important features for robust modelling and for further follow-up analyses. Shapley values satisfy certain axioms that promote fairness in distributing contributions of features toward prediction or reducing error, after accounting for non-linear relationships and interactions when complex machine learning models are employed. Recently, a number of feature selection methods utilising Shapley values have been introduced. Here, we present a novel feature selection method, LLpowershap, which makes use of loss-based Shapley values to identify informative features with minimal noise among the selected sets of features. Our simulation results show that LLpowershap not only identifies higher number of informative features but outputs fewer noise features compared to other state-of-the-art feature selection methods. Benchmarking results on four real-world datasets demonstrate higher or at par predictive performance of LLpowershap compared to other Shapley based wrapper methods, or filter methods.
- Oceania > Australia > South Australia > Adelaide (0.04)
- Europe > United Kingdom (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East > Jordan (0.04)
On marginal feature attributions of tree-based models
Filom, Khashayar, Miroshnikov, Alexey, Kotsiopoulos, Konstandinos, Kannan, Arjun Ravi
Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. We contrast this with the popular TreeSHAP algorithm by presenting two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the internal structure of tree-based models may be leveraged to help with computing their marginal feature attributions according to a linear game value. One important observation is that these are simple (piecewise-constant) functions with respect to a certain grid partition of the input space determined by the trained model. Another crucial observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble. Thus, the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. This remains valid for a broader class of game values which we shall axiomatically characterize. A prime example is the case of CatBoost models where the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models. This results in a fast, accurate algorithm for estimating these feature attributions.
- Banking & Finance (0.92)
- Health & Medicine (0.67)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Linear TreeShap
Yu, Peng, Xu, Chao, Bifet, Albert, Read, Jesse
Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)