deepshap
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
Zhang, Yang, Li, Yawei, Brown, Hannah, Rezaei, Mina, Bischl, Bernd, Torr, Philip, Khakzar, Ashkan, Kawaguchi, Kenji
Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.
Explaining a Series of Models by Propagating Shapley Values
Chen, Hugh, Lundberg, Scott M., Lee, Su-In
With the widespread adoption of machine learning (ML), series of models (i.e., where the outputs of predictive models are used as inputs to separate predictive models) are increasingly common. Examples include: (1) stacked generalization, a widely used technique [1-5] to improve generalization performance by ensembling the predictions of many models (called base-learners) using another model (called a meta-learner) [6], (2) neural network feature extraction, where models are trained on features extracted using neural networks [7, 8], typically for structured data [9-11], and (3) consumer scores, where predictive models that describe a specific behavior (e.g., credit scores [12]) are used as inputs to downstream predictive models. For example, a bank may use a model to predict customers' loan eligibility on the basis of their bank statements and their credit score, which itself is often a predictive model [13]. Explaining a series of models is crucial for debugging and building trust, even more so because a series of models is inherently harder to explain compared to a single model. One popular paradigm for explaining models are local feature attributions, which explain why a model makes a prediction for a single sample (known as the "explicand" [14]). Existing model-agnostic local feature attribution methods (e.g., IME [15], LIME [16], KernelSHAP [17]) work regardless of the specific model being explained. They can explain a series of models, but suffer from two distinct shortcomings: (1) their sampling-based estimates of feature importance are inherently variable, and (2) they have high computational cost which may not be tractable for large pipelines.
Explaining Models by Propagating Shapley Values of Local Components
Chen, Hugh, Lundberg, Scott, Lee, Su-In
In healthcare, making the best possible predictions with complex models (e.g., neural networks, ensembles/stacks of different models) can impact patient welfare. In order to make these complex models explainable, we present DeepSHAP for mixed model types, a framework for layer wise propagation of Shapley values that builds upon DeepLIFT (an existing approach for explaining neural networks). We show that in addition to being able to explain neural networks, this new framework naturally enables attributions for stacks of mixed models (e.g., neural network feature extractor into a tree model) as well as attributions of the loss. Finally, we theoretically justify a method for obtaining attributions with respect to a background distribution (under a Shapley value framework).