An Additive Instance-Wise Approach to Multi-class Model Interpretation
Vo, Vy, Nguyen, Van, Le, Trung, Tran, Quan Hung, Haffari, Gholamreza, Camtepe, Seyit, Phung, Dinh
–arXiv.org Artificial Intelligence
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures. Black-box machine learning systems enjoy a remarkable predictive performance at the cost of interpretability. This trade-off has motivated a number of interpreting approaches for explaining the behavior of these complex models. Such explanations are particularly useful for high-stakes applications such as healthcare (Caruana et al., 2015; Rich, 2016), cybersecurity (Nguyen et al., 2021) or criminal investigation (Lipton, 2018). While model interpretation can be done in various ways (Mothilal et al., 2020; Bodria et al., 2021), our discussion will focus on feature importance or saliency-based approach - that is, to assign relative importance weights to individual features w.r.t the model's prediction on an input example. Features here refer to input components interpretable to humans; for high-dimensional data such as texts or images, features can be a bag of words/phrases or a group of pixels/super-pixels (Ribeiro et al., 2016). Explanations are generally made by selecting top K features with the highest weights, signifying K most important features to a black-box's decision.
arXiv.org Artificial Intelligence
Feb-9-2023
- Country:
- North America > United States (1.00)
- Genre:
- Research Report (1.00)
- Industry:
- Government > Regional Government
- Health & Medicine (1.00)
- Law (1.00)
- Leisure & Entertainment (1.00)
- Media > Film (1.00)
- Technology: