Goto

Collaborating Authors

 shapley effect


Cooperative effects in feature importance of individual patterns: application to air pollutants and Alzheimer disease

arXiv.org Artificial Intelligence

In [1] a novel global feature importance method for regression has been introduced for explainable artificial intelligence (XAI) [2], based on recent results which generalize the traditional dyadic description of networks of variables to the higher-order setting [3, 4]. Notably, an increasing attention is being devoted to the emergent properties of complex systems, with a prominent role in this literature played by partial information decomposition (PID) [5] and its subsequent developments [6], exploiting information-theoretic tools to reveal high-order dependencies among groups of three or more random variables and describe their synergistic or redundant nature [7-11]. Within this framework, redundancy refers to information retrievable from multiple sources, while synergy refers to statistical relationships existing within the whole system that cannot be observed in its individual parts. The approach described in [1], named Hi-Fi (high-order interactions for feature importance), is rooted on a well known metric of feature importance named Leave-One-Out Covariates (LOCO) [12], i.e. the reduction of the prediction error when the feature under consideration is added to the set of all the features used for regression, and proposes an adaptive version of LOCO which provides three scores for each feature: the unique pure standalone (two-body) influence of the feature on the target, and the contributions stemming from synergistic and redundant interactions with other features. It is worth mentioning that the decomposition of feature importance in [1] clearly depends also on the choice of the hypothesis space for regression, hence it should be assumed that a proper model for data has been selected.


A new paradigm for global sensitivity analysis

arXiv.org Machine Learning

Current theory of global sensitivity analysis, based on a nonlinear functional ANOVA decomposition of the random output, is limited in scope-for instance, the analysis is limited to the output's variance and the inputs have to be mutually independent-and leads to sensitivity indices the interpretation of which is not fully clear, especially interaction effects. Alternatively, sensitivity indices built for arbitrary user-defined importance measures have been proposed but a theory to define interactions in a systematic fashion and/or establish a decomposition of the total importance measure is still missing. It is shown that these important problems are solved all at once by adopting a new paradigm. By partitioning the inputs into those causing the change in the output and those which do not, arbitrary user-defined variability measures are identified with the outcomes of a factorial experiment at two levels, leading to all factorial effects without assuming any functional decomposition. To link various well-known sensitivity indices of the literature (Sobol indices and Shapley effects), weighted factorial effects are studied and utilized.


PWSHAP: A Path-Wise Explanation Model for Targeted Variables

arXiv.org Artificial Intelligence

Predictive black-box models can exhibit high accuracy but their opaque nature hinders their uptake in safety-critical deployment environments. Explanation methods (XAI) can provide confidence for decision-making through increased transparency. However, existing XAI methods are not tailored towards models in sensitive domains where one predictor is of special interest, such as a treatment effect in a clinical model, or ethnicity in policy models. We introduce Path-Wise Shapley effects (PWSHAP), a framework for assessing the targeted effect of a binary (e.g.~treatment) variable from a complex outcome model. Our approach augments the predictive model with a user-defined directed acyclic graph (DAG). The method then uses the graph alongside on-manifold Shapley values to identify effects along causal pathways whilst maintaining robustness to adversarial attacks. We establish error bounds for the identified path-wise Shapley effects and for Shapley values. We show PWSHAP can perform local bias and mediation analyses with faithfulness to the model. Further, if the targeted variable is randomised we can quantify local effect modification. We demonstrate the resolution, interpretability, and true locality of our approach on examples and a real-world experiment.


SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

arXiv.org Machine Learning

Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.


Improving KernelSHAP: Practical Shapley Value Estimation via Linear Regression

arXiv.org Machine Learning

The Shapley value solution concept from cooperative game theory has become popular for interpreting ML models, but efficiently estimating Shapley values remains challenging, particularly in the model-agnostic setting. We revisit the idea of estimating Shapley values via linear regression to understand and improve upon this approach. By analyzing KernelSHAP alongside a newly proposed unbiased estimator, we develop techniques to detect its convergence and calculate uncertainty estimates. We also find that that the original version incurs a negligible increase in bias in exchange for a significant reduction in variance, and we propose a variance reduction technique that further accelerates the convergence of both estimators. Finally, we develop a version of KernelSHAP for stochastic cooperative games that yields fast new estimators for two global explanation methods.